Inside the Engine Part 8: Handling Advanced Search Use Cases

Building great search UX is one of the hardest problems in engineering, even for companies that have a large team working exclusively on search.

There are three essential hurdles to clear:

1) engine performance
2) interface intuitiveness
3) result relevance

We’ve addressed search performance in several of the previous articles of this series; this article will focus on the link between user interface and relevance. It will cover a wide range of use cases — from common to very specific. If users complain that they cannot find what they are searching for quickly, it is probably caused by a mix of non-intuitive interface and poor relevance; this is why it is important always to consider them together.

Providing good relevance for the most common cases

When people are searching for something very general like a company name on LinkedIn or a person on Twitter, you should make sure you are using a notion of popularity on the results: a result with more followers has more chance to be what the users are searching for. Solving this problem has been our primary focus at Algolia from day one. We have addressed this by developing a configurable but easy to understand ranking algorithm where developers have full control of their search configuration. You can configure the popularity via Custom Ranking, make sure you can address a complex problem like typosquatting, promote some featured items, etc.

There is an endless way to configure the ranking, depending on the use case. We could even help Twitter solve their search issues :), as you can see in the following example:

Searching for the official Barack Obama account on Twitter with a typo does not return the official account because of typosquatting (May 2017)

Providing good relevance for advanced search use cases

Solving advanced search use cases is an entirely different problem where you need to let the user refine search results via filters. We will cover in this section the three main approaches that are used to facilitate an advanced search.

Use of faceting

Faceting is probably the most standard way to address advanced use cases. The principle is to let the user filter the results based on categories to reduce the size of the results set and remove all false positives or ambiguity. This feature is used in a wide variety of use cases, and there are a lot of different ways to present faceting results, as you can see in our faceting documentation.

Faceting example: filter on genres and rating

That said, faceting is not the perfect solution in all situations as two significant concerns can hurt your relevance:

1) It requires homogenous data: you need to have the same categories inside all your records to have a good UX. If this is not the case, you might have duplicates in your faceting results that will lead to users not finding their record while filtering.

2) The category that the user wants is hidden
: when the number of categories to display is large, you won’t be able to show all of them, and you might miss the one that the user wants. It will be a challenge to satisfy this user, who will probably just consider your search not good enough.

Search in faceting results

To fix the second problem—faceting not visibly proposing the category that the user is looking for—you can offer users a search for facet values. This means that the full range of categories does not even have to be displayed. Rather, you can offer the most relevant or meaningful categories, and let the user search for the rest. You can find this type of experience on LinkedIn: you can filter on locations when you are looking for a person or a company. LinkedIn proposes you to search for a particular location that is not listed in the selected facet values, which allows being very granular in your search.

The only problem that you might have on Linkedin is that this search is not contextual. You are searching inside all locations but without applying the current textual query and filters. Let’s say that you are looking for a Director of Marketing job in Buenos Aires but that there aren’t any available on LinkedIn. After you’ve searched for the job title, LinkedIn allows you to refine your search and select “Greater Buenos Aires” in the location facet value search, although this will not return any results. This is frustrating as you’d naturally expect not to be offered filters that will lead you nowhere.


Implementing a search in faceting results while applying the contextual query is very complex, and this is probably the reason why LinkedIn preferred to implement a degraded version. A better search experience is our primary motivation at Algolia, which is why, a few months ago, we released a feature called
search for facet values that allows you to develop this type of experience in minutes while applying the contextual query. Here is an example of an Algolia search in the brand value in an e-commerce context.

search-inside-facet-valuesSearch inside facet values: example of the brand facet

As you can see, the search is completely contextual: all filters/refinements are related and inclusive of one another. You can find more documentation about how to use it in our developer documentation.

Advanced query syntax

Search for facet values is very useful when you are looking to expose an advanced search to your user without burdening them with a learning curve, which is mandatory for a consumer product. If you are working on a business product that people use every day, you might want to expose them an advanced syntax inside the search box like Slack does.

advanced-syntax-in-a-search-box-slackadvanced-syntax-in-a-search-box-slack
Example of advanced syntax in a search box (Slack)

In practice, proposing such an experience requires the same feature as searching for facet values. It is just a different way to expose the same feature. The goal of this display is to let your advanced users directly perform their advanced query via the keyword and minimize the number of steps they will need to search for their content. You can implement such interface easily with our search for facet values feature. We will release a guide soon to help you through the implementation.

The implementation

We described in the previous sections how the search for facet values feature is useful for implementing an advanced search interface. We will now focus on the implementation to discover how this feature works internally.

I am sure you will think this type of search simply requires a change in the way we perform the query, like rewriting the query to specify which attribute we will target. But in practice, this is a significant change as we do not return records but rather highlighted facets with counts.

Let’s take a simple example of a professional social network like LinkedIn to illustrate the implementation. We will take simple records containing only four attributes. Those attributes will contain a string and will all be configured as attributes for faceting:

  • The name of the person
  • Their title
  • The company where they work
  • The location of the company

Here is an example of such a record:

At first sight, this seems like a simple problem to solve for any search engine. Let’s look at what you need:If a user performs the query ”Twilio”, they will retrieve hundreds of Twilio employees. Let’s say the user wants to sell Twilio a service in San Francisco and wants to see all the Director titles to find the closest one to their service (this is, of course, a purely fictional use case :)).

  1. 1) The profiles need to work for Twilio, so we can restrict the search on “Twilio” to the “Company” field.
  2. 2) They need to be a Director, so you can restrict the search on “Director” to the “Title” field.
  3. 3) If multiple people have the same title, you want to see the title only once, so you need to deduplicate the results. To achieve this, you can use a facet on the attribute “Title”, and display the facet values returned.

In theory, this search should provide the results that you want, but in practice, it’s more complex than that:

1. People often have multiple job titles in their profiles. And if someone has both “Director of Sales” and “Account Executive” listed as a job title, this query strategy will mention both job titles in the list (because the query relies on faceting). Do you really want to see “Account executive” in the list of results for the query “Twilio Director”?

2. Since we’re displaying facet values, and not search results, we cannot highlight the words that match the query. It’s always better to have highlighting, particularly when people type something with typos (Diretcor), or if you search at each keystroke (Direc).

3. It creates a complex UI/UX, because users will need to specify that Twilio should only be searched in Companies, instead of allowing them to search Twilio in all fields (which would also potentially improve the relevance, on top of reducing the complexity).

  1. There is a better solution: search directly in facet values while providing the context of the query. Let’s see how we can do this by using the “search for facet values” feature on the facet “Title”:

This query will be applied in two steps:

First, it will retrieve only the results containing the word Twilio in one of the searchable attributes. From this list, it will extract all of the values for the facet Title (i.e., all of the jobs titles listed in every profile that contain the word Twilio).

Then, in this filtered list of job titles, it’ll search for the ones that contain “Director”. This allows us to only retrieve relevant results, by using all of the regular search features (typo-tolerance, highlighting, prefix-search, ranking…). The result is the list of all the Twilio job titles containing “Director”, deduplicated and ordered by count — exactly what we were looking for.

In other words, this feature requires a two-step process that is only doable efficiently when implemented at the heart of the engine.

Evolutions

The release of this feature enabled all our existing users to build a great advanced search interface quickly. We, of course, do not plan to stop here and are already thinking about the next evolutions of the feature. For the moment, the ranking of results is pretty basic and is only using the frequency of facets, which means you can have a result with typo first. We plan to improve the feature by providing a different way to rank the results.

If you want to know more about the internal aspect of the Algolia engine, we recommend reading the other posts in this series: