< Back

How Algolia tackled the relevance problem of search engines

When we started Algolia, we wanted to build an offline search SDK for mobile applications. Unlike other search engines on the market, we couldn’t rely on pre-existing search engine software such as Lucene because they weren’t available on mobile.

We had to build everything from the ground up within the limitations imposed by smartphones such as low memory and low processing power. Ultimately, we spent a considerable amount of time reinventing the search engine from scratch.

When we eventually decided to pivot to a SaaS solution, we had built a completely new ranking algorithm. And in a typical “constraints foster creativity” situation, we had made two huge improvements: Speed and Relevance.

Algolia is known for its blazing-fast speed. It’s usually the first thing people notice when they start using it and also what makes it so exhilarating to use. But what I consider to be the most impressive and important achievement of Algolia is not actually speed but Relevance.

Algolia reinvented how search engines define relevance.

Before diving into the specifics behind Algolia’s ranking formula, let’s briefly go over how other search engines work.

A little history of search engines

In the late 90s, the Text Retrieval Conference organized a series of annual workshops financed in part by the US Department of Defense. The goal of these workshops was to determine the best ranking algorithm for searching a dataset consisting of long unstructured documents. The conference resulted in several enhancements to search engine ranking algorithms, the most defining being improvements to TF-IDF, what was then the main component of the most powerful ranking formulas.

TF-IDF is a statistical weighting scheme based on two elements: TF or Term Frequency (the number of occurrences of a word in a document, which is calculated on a per document basis) and IDF or Inverse Document Frequency (the importance of a query word within the whole corpus of documents).

Basically, if a query word is repeated multiple times within a document, that document will rank higher in the results than a document that only contains the query word once. IDF comes into play to reduce the importance of words that occur a lot in the corpus such as “the”, “of”, “a”, etc. The more times a query word recurs within the whole corpus, the lower a document containing that query word will be ranked. The length of the document is also taken into account; a short document with query word matches will rank higher than a long document with the same match.

TF-IDF (along with similar algorithms like BM25) became a central tool to score/rank documents given a user query, and it was used to search any type of data. Unfortunately, it was not actually designed for that! TD-IDF was built as an answer to TREC’s workshops and was designed specifically to work within long unstructured documents such as PDFs or enterprise documents.

The issue with this kind of statistical approach is that it surprisingly only makes sense for a very small number of use-cases, like enterprise document search, where:

  • You don’t really know what type of data you’re searching into
  • The content is drastically different from one document to another
  • The documents don’t have any popularity metrics, and you need to rank them solely based on their “textual relevance”

What Algolia was built for

Algolia wasn’t built to search long unstructured documents. It was built to power the search of websites and mobile applications. And in 99% of cases, this data is very different.

  • It has a defined schema with meaningful attributes that you know beforehand (title, name, list of tags, description…)
  • You know the underlying content, what matters and what doesn’t (e.g. is the title more important than the description?)
  • There are often one or more popularity metrics associated with each object (number of sales/likes/views)
  • As the owner, you sometimes have a strategy that impacts the ranking (give a bonus to featured objects)

Let’s look at an example

Say that you’re on IMDB searching for an actor, and you type “Brad”. You don’t care about the Term Frequency (how many times the document contains the word Brad), or the Inverse Document Frequency (how many times the word Brad is present in the list of all actors).

Fortunately, there are other signals that matter:

  • Is the word Brad in the “name” attribute or in the description?
  • Is it the first word or the last word in this attribute?
  • Is it spelled exactly B-r-a-d? Does it contain a typo? A suffix?
  • How many followers/likes/views does Brad have?

As you can guess, these questions give us a much better chance at finding Brad Pitt than a statistical approach like TF-IDF.

This is the first pillar of Algolia’s revolutionary improvements—the rules taken into account in the Ranking algorithm.

The rules in Algolia’s ranking formula

Algolia doesn’t rely on any variation of TF-IDF. Instead, it uses six default rules to evaluate the textual relevance of an object for a specific query:

  1. 1. Typo: Will rank higher a word that doesn’t contain a typing mistake.
  2. 2. Words: Will rank higher an object matching more of the words typed by the user if the query contains more than one word.
  3. 3. Proximity: Will rank higher words close to each other (George Clooney is better than George word Clooney).
  4. 4. Attribute: Will rank higher a match in a more important attribute (Title, Description).
  5. 5. Words position: Will rank higher words that are at the beginning of an attribute.
  6. 6. Exact: Will rank higher words that match exactly without any suffix (Algolia natively handles prefix search).

Most search engines have somewhat similar implementations of the Words and Attribute rules (some also have an implementation of Proximity). The rest of these rules are unique to Algolia. You can of course disable or customize the rules to fit your specific use-case, and you can also use non-textual criteria such as geo-location.

These rules are a great improvement over what existed before, but the biggest and most important change Algolia brought to the table is something else — how these rules are used to rank the objects.

How is the ranking done?

The biggest issue with the relevance of most search engines is not TF-IDF (which can actually be disabled in most search engines using it). It’s the way the ranking criteria are used. Other search engines compute a big floating-point score that mixes everything. They have a very complex mathematical formula with a lot of coefficients/boosts that merges every rule into one score for each document. Then, they use this unique score to rank the search results.

The first problem with this approach is that it mixes apples and oranges. For example, if your ranking formula takes into account TF-IDF and the number of likes of the documents:

  • What happens if a document has a very high number of likes? It will override all the other criteria like textual relevance, and this document will always show up in the results, even when the textual relevance score is very bad.
  • Similarly, if an object is the most relevant to the query but doesn’t have any likes, it won’t show up in the results.
  • The second issue with this approach is its complexity. Nobody is able to understand what’s really happening. Even the most skilled search experts need to spend a lot of time crafting the best mathematical formula. Via a trial-and-error process, they’ll find a formula that works well for a list of the 100 most popular queries, but the formula cannot be optimized for the whole catalogue. And as soon as the catalogue (or any metrics used in the formula) changes, they need to go back to the drawing board.

This exception-driven approach is not maintainable.

Algolia’s ranking formula is designed to fix this. And it does so with a different ranking approach via the Tie-Breaking algorithm.

The tie-breaking algorithm

Instead of calculating one big score and then sorting the results once, a tie-breaking algorithm calculates N scores and applies N successive sorting. Here’s how it works:

  • All the matching records are sorted according to the first criterion.
  • If any records are tied, those records are then sorted according the second criterion.
  • If there are still records that are tied, those are then sorted according to the third criterion.
  • And so on, until each record in the search results has a distinct position.

The order of the criteria determines their importance. It’s like in a children’s game—if you sort by color and then by shape, you don’t get the same results as when you sort by shape then color.

Configuring the relevance of Algolia for a specific use-case is as simple as figuring out the set of rules that need to be applied and ordering them by importance.

Oh, and everything is customizable—from the order of the attributes to which attributes are used. So there’s virtually no ranking strategy that you can’t build with this ranking formula.

Let’s consider an example with the following dataset:

[
 {
  "name": "Jon Black",
  "featured": true,
  "number_of_likes": 4
 },
 {
  "name": "John Jackson",
  "featured": false,
  "number_of_likes": 17
 },
 {
  "name": "John Paul",
  "featured": false,
  "number_of_likes": 3
 },
 {
  "name": "Jon White",
  "featured": false,
  "number_of_likes": 9
 },
 {
  "name": "John Thompson",
  "featured": true,
  "number_of_likes": 8
 }
]

And this index configuration:

Index Our ranking formula on Algolia's dashboard

To simplify our example, let’s ignore some of the rules and focus only on Typo, Featured and Number of Likes. For the query “John,” we’ll get the following results:

  1. 1. John Thompson (0 typos; featured; 8 likes)
  2. 2. John Jackson (0 typos; not-featured; 17 likes)
  3. 3. John Paul (0 typos; not-featured; 3 likes)
  4. 4. Jon Black (1 typo; featured; 4 likes)
  5. 5. Jon White (1 typo; not-featured; 9 likes)

In this example, we:

  • Decided that Typo was the most important rule. If an object has a typo, it should be ranked lower, no matter what the other rules are.
  • Gave a bonus to records marked “featured”.
  • Decided to position our main popularity metric, Number of likes, at the bottom of the formula so it doesn’t override the textual relevance criteria.

A few things to note:

  • We didn’t need to use coefficients or boosts of any kind. Nor did we need to craft some complex mathematical formula. But we still managed to take into account business metrics and merge them with our textual relevance.
  • This formula works for every query. It is not designed to work for the top 100 most popular searches, it is designed to work for the whole catalogue.
  • What happens if 50% of the catalogue changes tomorrow? Nothing. We don’t need to change the formula as long as the structure of the data remains the same.
  • It is very simple to understand why one result is ranked higher than another. And if you can understand the results, fine-tuning the formula will be much easier.

Where do we go from here?

As of writing this, Algolia is live on thousands of websites and mobile applications, and this ranking strategy has consistently proven successful. We have spent a long time fine-tuning every detail and are pleased with the level of performance. We continue to make relevance a core focus, in particular to introduce new criteria to better enable the personalization of results per user.

We’re always happy to help you figure out how you could use this algorithm to achieve the ranking strategy that best fits your specific use-case. And we’re eager to know what you think, so feel free to leave us a comment.

  • The goal is not just to sort on different fields but to have a global approach of ranking that takes into account both textual relevance and business data without using one big formula that is difficult to control. This is exactly what the Tie-Breaking algorithm is doing. To our knowledge, no other search engine allows to do that today. Do you know any way to do that with another engine?

    • Julien I don’t disagree that your tie breaking strategy can be effective. But its only one strategy. And its a strategy, with work, you could implement in Lucene-based search. A developer can’t incorporate all the functionality of Lucene into Algolia right now.

      Then again, if you know you want tie breaking scoring focussed on short snippets and a few other ranking factors. And you don’t want to deal with grouchy Lucene developers like me :-p then Algolia seems like a good solution.

      I disagree with the notion that there’s a single solution to relevance. I feel this article markets itself as a silver bullet. I think that leads people down a dangerous path of not considering how their relevance problem differs from a general relevance problem.

      Best,
      -Doug

      • Thanks for your feedback Doug, having different point of views is always nice! And don’t worry, we don’t claim to have found a solution to all the issues on all the use-cases. This approach is just the one we’ve found to work best when we encountered the problems that you mentioned.

        And transparency is part of our DNA, so we thought we’d share it! Similarly, we have an ongoing “Inside the engine” series explaining more on the internals of what we built, have you seen it?

  • Charlie Hull

    It’s interesting to hear how you’ve derived what is indeed a very simple ranking method, but it is foolish to dismiss TF/IDF and BM25 – they’ve worked very successfully for many engines and continue to do so. Most engines – certainly Solr and Elasticsearch – also allow extremely flexible boosting (which could be used to add the effect of a popularity score), allow fuzzy search or ‘did you mean’ spelling suggestions, proximity searches, minimum-should-match etc., covering all six features you’ve used for your ranking formula. I’m afraid “Nobody is able to understand what’s really happening” is incorrect as well – the algorithms are very well understood and documented.

    Search tuning *is* hard, and needs constant attention (we’ve written a series of blogs on the subject http://www.flax.co.uk/blog/2016/03/18/get-started-improving-site-search-relevancy/) but this isn’t because there’s anything wrong with the underlying algorithms.

    • Nicolas Baissas

      Interesting take, thanks! Yes, these approaches have been and are still successful for many engines, no discussion here. Our point here is: as an industry, we can probably do better.

      You mention that “Search tuning *is* hard, and needs constant attention”. I agree. But we think there are better alternatives than TF-IDF and the likes to make it less hard, more reliable, while improving the relevance.

      Good point though that there are ways to mimic most of our ranking rules with other engines. Algolia handles all those features natively, and combines them with a good performance. For example, combining the typo-tolerance and the proximity matching in ES (while ranking by typos, not only matching results with typos) becomes almost impossible – and very slow – when the dataset is big enough.

  • John Berryman

    It seems reckless to effectively toss aside an entire branch of academic research that has now spanned decades. By using your own heuristics in place of the well tested TF*IDF or BM25 algorithms this it what you’re doing.

    For other comment readers, here is a very approachable academic text on relevance http://nlp.stanford.edu/IR-book/, or if you want a more engineer-focused text see what you think about this Manning book: http://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277X?ie=UTF8&redirect=true&ref_=pe_2313370_193871620_em_ti (I’m partial to it.)

    • Nicolas Baissas

      Thanks for your input. Actually we agree, TF-IDF and BM25 are impressive achievements of academic research. We have used them, enjoyed it, and recognize that they’re the best solution for what they’re trying to solve.

      The thing is: they’re built to solve searching in long unstructured chunks of text. We’re built to search into structured documents. And today, most apps/websites have structured data, with a rich structure and lots of meta data like popularity metrics.

      And algorithms like TF-IDF are not designed for that: the number of times “Rihanna” is written on the description of a profile on Spotify doesn’t indicate how good it is for this query, same thing for “Brad” on IMDB, or “Milk” in an ecommerce store.

      • Solr and Elasticsearch aren’t opinionated about term frequency. You can disable term frequency easily. And for short snippet search, it makes sense to do so, as I write about in this article no title search

        http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/

        The difference is Algolia works out of the box for short snippets. Algolia can be thought of as a fairly opinionated relevance solution. Solr/ES are rather unopinionated relevance frameworks. To the point where you can use any strategy under the sun, from controlled vocabularies, NLP, boosts on factors like recency and popularity, personalization components, and even optimizations like algolias, to solve relevance problems.

      • John Berryman

        Ah I see the concern you’re addressing with Algolia. So Algolia is tuned to be good for short field search (titles, names, etc.) whereas a more general search engine would be better for full text search, for instance searching product descriptions. Is that about right?

  • This is a very misleading portrayal of Lucene-based search. Just because TF*IDF & BM25 are one way of ranking results, doesn’t make it the only relevance tool in the Lucene toolbox.

    I mean just look at the Elasticsearch Query DSL:
    https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

    And the list of similarities other than TFIDF
    https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/package-summary.html

    And text analyzers… for dozens of languages
    https://lucene.apache.org/core/4_0_0/analyzers-common/overview-summary.html

    And of course Lucene can deal with recency, popularity, etc. Both let you directly incorporate them into the ranking function anyway you want:
    https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor

    And this is the tip of the iceberg.

    Solr and Elasticsearch are *frameworks* for incorporating just about any possible way to rank results. Short snippets is just one use case. But in my work, its only one factor amongst many that folks need in a search solution. Personalization, taxonomies, ontologies, any feature that might describe a user or a piece of content need to be brought to bare. And having the power to express ranking very specifically and carefully is the relevance feature of Solr and Elasticsearch: not TF IDF.

    But I say that and don’t mean to discount Algolia. You guys have built something interesting, but I wouldn’t say it “solves relevance.” There’s sadly no silver bullet in relevance ranking. It takes work, dedication, and soft skills like understanding your users and business.

    • Eduard Dudar

      > You’ve described problems with direct additive boosting

      I see this problem with multiplicative boosting as well but with recency in my case. I’m not talking about extreme cases when document is almost completely irrelevant but when document is not new enough and falls out out of decay param but highly relevant it’s pushed down by less relevant but newer documents. My audience is very picky and they scream immediately when they see such a case.