Ruby on Rails Projetos

What I've learned building a Ruby on Rails search engine and analytics

23 Jan 2023

I was assigned to the given task: developing a real-time search engine that would store analytics from what users searched for. "Hmm, maybe I could try to replicate Netflix's search for movies," I thought. And thus Movie Search was born. It was a 48-hour project that helped me improve many concepts and learn new technologies. In this post, I'll share the challenges I encountered and the solutions I devised.

Link to GitHub repository: https://github.com/devaniljr/movie_search
Link to demo: https://moviesearch.devanil.dev/

Design

I like to draw a prototipe of every project I work on:

It's good to imagine the desired results.


Midjourney was used to create the logo. And for CSS, i'd always stan for Tailwind. You can begin your project using Tailwind with rails new <app-name> --css tailwind. And always remember to run the server with bin/dev rather than rails server.

The real-time search box

That was the ideal timing to experiment Hotwire with Ruby on Rails. The promise of Hotwire is to provide interactivity to the application using HTML rather than JavaScript or JSON. I followed this tutorial and the result was smooth. I simply used a turbo_frame_tag and did some tricks in the search form, and every time I hit enter the results appeared without reloading the page. It was really nice to see it working without muching pain.

Instant results with Hotwire.


But I needed something else: as the user types, the results appear in real time. This is something that turbo frame cannot do. I needed to include some JavaScript that tells it to "hit enter at x miliseconds" and then returns the results. For that I use a small snippet within a Stimulus controller, and voila, my real-time feature was ready.

It was really fun to try Hotwire for the first time, but I think it's time do deep dive into how it works to become more proficient. I'd like to add real-time updates on the analytics section, but can't figure out how. Hope to solve this in the near future after watching Pragmatic Studio's course.

I started with a simple SQL ILIKE query for the search, but I realized that I needed better results for my analytics to work. This is where Elasticsearch comes in. I had never used it before, but it was very simple to setup using the Searchkick gem and this tutorial. Elasticsearch tries too figure out what the user is looking for and returns the appopriate results. I really liked the way it works and definitely will learn more about it.

The algorithm for analytics

I still have one problem: I needed it to save somewhere what people are searching for and display analytics for it. I was givin the requeriment to be scalable and account for a large number of searchs per minute. So I decided to store some data in Redis, treat this data with a cron job and Sidekiq, and only then store in the database for analytics purpose.

That's the fundamental logic:

  1. 1. A person types a search, and his query and the first result displayed are saved in Redis. If the person only writes "Inter" and is fine because it founded "Interstellar", than it will save "Inter" as the query and "Interstellar" as the result.

  2. 2. At every 5 minutes, a job is executed in Sidekiq to treat this data stored in Redis and discover what was the movie searched. I didn’t complete the activities listed below in the first phase because, in a real-time search, every key input would trigger all of this logic, which would be incredibly consuming.

    1. 2.1. If Elasticsearch returns a clear result, then the algorithm considers it to be the intended result of the search. 

    1. 2.2. If there is no result at all, it searches the Movie Database API for one, and if it finds something, it is the intended result.

    1. 2.2.1. In addition, the movie is stored in the database, so if you don't find it in the first query, it will be found in 5 minutes.

    1. 2.3 If no result is found, so the query itself is saved, but there is no match in the project database.

  3. 3. After determining what movie was searched for by the user, it will hit a new key in Redis. One for total searches ever, another for searches this year and another for searches this month.

  4. 4. The result is saved in the Postgres database.

  5. 5. Finally, the temporary key associated with the specific search is deleted it from Redis.

The search algorithm.


Of course, there is still a lot of room for improvement. Here are some edge cases that I'd like to fix in the future:

  1. 1. A person can type "Star" and be satisfied with the result, but not necessarily click in the first result, and click in the second, third instead... At the moment, the algorithm only saves the first result. One workaround is to save the page to which the user was redirected after the search. I'm considering adding another layer of the algorithm in the show action. Because each search is assigned a key in the session user, it's simply a case of checking to see if the person entered a specific movie page and prioritizing this action over the first result.

  2. 2. Every time the searchbox is empty, a new random key is assigned because it considers a new search. While this works in most cases, some people select all text inside the box and begin typing right away. The algorithm do not consider that a new search (because the input is not empty).  I'm considering a fix for this, and one possibility is to use some javascript to limit the keys pressed.

After the algorithm discover what people are searching for, it simply displays the number of searches stored in database, as well as the hit count for how many searches were maded.

The deploy

I expected the deployment to be a little more difficult because it would require not only the deployment of Rails and Postgres, but also Redis and Elasticsearch. Finally, everything works smoothly in my Dokku server, and the most difficult part was configuring Sidekiq. So, here are some issues I encountered:

  1. 1. Stimulus controllers were not being bundled. It take a time to figure out that even though I was using importmap for javascript, my javascript configuration file was not acting the right way. It's simply a matter of following the manual installation instructions for Stimulus.

  2. 2. Sidekiq requires extensive configuration to run in a Dokku environment. Fortunately, there is a step-by-step tutorial available here.

  3. 3. Because this is my first subdomain project stored in my devanil.dev domain, I've had trouble configuring DNS to discover the project. The solution was to migrate my DNS configurations to Digital Ocean and follow this tutorial.

This was one of the most challeging projects that I worked on, and I was pleased with the results. But if you have any consideration about my logic or code here, please let me know; I really need this feedback to grow as a developer. And let's improve one line of code at a time.

Sign up for my newsletter

Whenever I have updates here I will send it to subscribers.

Join the conversation

I created a post on LinkedIn for comments:

Comment