Virtually every app sooner or later needs some kind of a search, and when it comes to searching against your
database, simple SQL LIKE is normally not enough due to a number of reasons (word stemming, speed, etc.) In this case, the obvious solution is full text search.
When it comes to selecting a full text search engine, you have a variety of options. Most popular open source engines are: Apache Solr, Spinx, Elastic Search. Heck, Postgres has it’s own full text search implementation (Ryan Bates has an excellent railscast on it, I will touch on it in later posts). As a Rails developer, I often look for solutions that will easily integrate into Rails app, i.e. I look for Ruby gems. And, when it comes to full text search, you are fortunate. There are plenty of options. sunspot_rails is powered by Apache Solr, thinking_sphinx by Sphinx, and tire by Elastic Search. Moreover, there is at least one Heroku addon for each one of these.
I am planning to make this and the next post on sunspot_rails, then write a post on Elastic Search with tire, and finally, have a post on full text search with Postgres.
Personally, I’ve been using sunspot_rails for more than two years to power full text search on a couple of relatively big APIs with hundreds of thousands documents each. So far, I’m happy with results: Sunspot is very easy to setup, customize and run complex search queries. And, with over than 300,000 downloads it is the most popular Solr client for Ruby applications.
Setting up Search
I’m not going to go in details on how to set up sunspot_rails in your app since it can all be found on sunspot’s readme page. I’ve been using sunspot to run search for the API containing job postings data. And here is the searchable block of my main model:
Nothing extraordinary. I run fulltext search on two fields: title and description, giving title a little more weight in relevancy calculation. I use other fields for scoping. The sample search routine looks like this:
The code above will return a first page of results that will be contained in the @search.results. Also, @search will have other useful meta information about search results: faceting and pagination data.
Since this data is located on the REST API that backs several apps, it has to be returned in JSON. It is pretty straight forward for the @search.results collection. I use rabl templating engine to help me with that:
However, returning faceting and pagination in JSON is a little trickier. @search responds to facets method, which in my case contains three collections of facet data. But, if I just try to return @search.facets, it returns very complex collection of objects like this:
This method, when invoked on a @search object will return me a hash of assets above that could be easily serialized into JSON.
All sunspot_rails results are returned by sunspot as WillPaginate::Collection objects, and therefore, they contain all nessesary pagination information that may be needed to generate pagination links on a client:
Since the client app that will be consuming this data will be generating pagination links using will_paginate gem, it will be a good idea to return it in JSON in “will-paginate-friendly” format, like so: