A Blog About Programming, Search & User Experience

Jadopado delivers instasearch for mobile and web powered by Algolia

Algolia Increases Online Search Sessions By 60% and Unique Mobile Searches by 270%

The following post is a guest post by Omar Kassim, co-founder of JadoPado.

Omar-Kassim-Founder-of-JadoPado Founded in 2010, JadoPado is one of the largest e-commerce sites servicing the GCC, Middle East, North Africa and South Asia.  Its CEO Omar Kassim wanted to bring an Amazon-like experience to the region.  In just 3 years of operations the company now boasts thousands of customers, hundreds of vendors and over $7 million in annual revenues.

Realizing that search is a key component of their user experience and engagement, Omar and his small team of 15 set off to build new search capabilities that would help users find the products they wanted, lightning fast. In addition, the team was developing a revamped mobile experience and saw that search needed to be spot on for both smartphones and tablets. “I saw search as a competitive tool and as a strategy to get a leg up on our competition.  After seeing Algolia on Hacker News I was absolutely blown away.  After looking at the demos, we threw out what we were doing internally in terms of a small search revamp and I had one of our team get cracking with Algolia right away. As a little startup, it really helped that Algolia’s price points were within reach in terms of not breaking the bank to get things rolling.”

The Power of Instant Search

After configuring and testing Algolia for two weeks, JadoPado had the results they were looking for. Branded internally as InstaSearch, JadoPado knew that it would dramatically improve how search functioned on both mobile and the web at JadoPado. “The idea from the outset was to build InstaSearch. I kept ending up at the Algolia demo and thought it would be incredible if we could forget all user interaction aside from typing and just display results right away. Remove what you’ve typed and the results disappear taking you back to where you were. We then spent a bit of time figuring out how to get each result “page” to have a URL that could be used with external search or shared elsewhere,” explained Omar.

japopado ecommerce

Making Search Seamless

“We looked at a number of solutions. One of our biggest intentions was to try to get search to be extremely fast and as slick as possible. Customers should feel like search “just works” and that it is a super easy way to get straight to to whatever they may be looking for. Algolia has allowed us to accomplish that,” Omar explained.  Moving search from a not really working internal model to a search as a service platform has allowed us to focus on other areas while knowing that search works and that we’ve got an edge over our competition.”

Support For Arabic

With more than 20 countries to support, the JadoPado team knew that the key to success in the region was to ensure that search be delivered in Arabic as well. Omar explained, “The final bits were figuring out a separate set of indexes for Arabic (as we were about to roll out a standalone Arabic version of JadoPado) and getting the faceting right. This was easy to do with the deep Algolia documentation.” Algolia works with all languages, including Chinese, Japanese, Korean, and Arabic. No specific configuration required, speed and ranking perform exactly the same way.

Better Business Through Search

In May the team rolled out InstaSearch, Arabic support and a newly revamped mobile experience with search at the center. JadoPado immediately experienced a doubling in conversions and activity that was triple a typical day. Compared to the same 30 day period in 2013, JadoPado saw an increase in site visits through search from 8.2% to 11.3%.


  • Sessions with search has jumped 59.96%.
  • Unique searches has jumped 46.87%
  • Average search depth has increased by 58.87%.

Mastering Mobile Through Search

The greater impact of Algolia’s hosted search was JadoPado’s revamped mobile experience. Search is often the first action customers take on a mobile device.  With instant search, autocorrect and full language support,  improving search and the quality of results can have a significant impact on revenues.  With Algolia implemented as part of JadoPado’s mobile site, the company saw strong results with visits from search increasing from 4.3% to 15% over the same time period and session exists decreasing by 16.57%. A big change. And search increased engagement on all levels:

  • Mobile sessions with search jumped by 233.92%
  • Total unique mobile searches jumped 268.37%
  • Average search depth on mobile devices jumped by 41.05%.

Images courtesy of JadoPado. Learn more on their website.

AfterShip Leverages Algolia’s Search as a Service to Track 10 Million Packages Around The World

Algolia Speeds Up Search Result Delivery Times From 10 Seconds To 250 Milliseconds

The following post is a guest post by Teddy Chan, Founder and CEO at AfterShip.

Teddy Chan AfterShip AfterShip is an online tracking platform which helps online merchants track their shipment across multiple carriers and notify their customers via email or mobile. Being an online merchant myself, I shipped more than 30,000 packages a month around the world. When customers contacted me to get an update on shipments I realized that I couldn’t track shipments from different carriers and get updates on their status in a single place. So I built Aftership to allow both consumers and online merchants view all their packages on a single platform.

After winning the 2011 Global Startup Battle and 2011 Startup Weekend Hong Kong Aftership opened into beta and quickly helped thousands of online merchants to send out over 1,000,000 notifications to customers.

One of the key parts of our service is providing customers around the world with up-to-date information about their packages.
Right now we have more than 10 million tracking numbers in our database. This causes a few different challenges when it comes to search and we needed technology that would help us continuously index constantly changing information.

Our first challenge is that we are a small team with only 1 engineer.

We are not in the search business, so we needed a solution that would be easy to implement and work well with our existing infrastructure. Algolia’s extensive documentation made it easy to see that our set up and implementation time would be extremely fast and would work with any language and database, so we could get back to our core business.
Algolia was super easy, we had it tested, up and running in a week.

Our second challenge was quickly delivering search results.

On Redis, searching for packages was simply impossible. For each query, it would simply lock up until the result was found, so it could run only one search at a time. Each search with Redis was taking up to 10 seconds. With Algolia we reduced search result delivery times to 250 milliseconds for any customer anywhere in the world. When you think about thousands of merchants who send more than 1 million packages per month, you can see how speed is critical.

Downtime also is not an option when tracking packages around the globe.
We are very strict when adopting new technologies and SaaS technologies can’t slow down our system.
Algolia had the highest uptime of the other solutions we looked at. There was no physical downtime.

Our final challenge was search complexity.

Sometimes you need to know how many shipments are coming from Hong Kong and exactly where they are in transit to and from the U.S.. Shipments going around the globe can change status several times within a single day. With Algolia’s indexing we are able to instantly deliver up-to-date notifications on all 10 million packages, so that customers can not only track their package on its journey, but they can also go to their online merchant’s shop and see a real-time status of their package.

In the end, it was Algolia’s customer service that won us over.
Similar services and platforms were not responsive. With Algolia we either had the documentation we needed, immediately were able to get advice from an engineer or had our problem solved in less than a day. With such a small team this means a lot. And with the Enterprise package we know that Algolia will grow with us as quickly as our business does.

Want to find out more about the Algolia experience ?
Discover and try it here

Our 4th datacenter is in California!

Do you know the 3 most important things in search? Speed, speed, and speed!

At Algolia, we work at making access to content and information completely seamless. And that can only be done if search results are returned so fast that they seem instant.

That means two things for us: getting server response time under 10ms (checked), and getting the servers close to end-users to lower latency.

We are on a quest to make search faster than 100ms from anywhere in the world, and today is an important step. We are thrilled to announce the opening of our 4th datacenter, located in California!

Choose Datacenter

You can now choose to be hosted on this datacenter when signing up (multi-datacenter distribution is also available for enterprise users).


Concertwith.me’s Competitive Edge: A Revamped Search UX with Algolia

There are a lot of music discovery apps on the market, yet sifting through concert listings is anything but seamless. That’s why Cyprus-based startup Concertwith.me aims to make finding local concerts and festivals as intuitive as possible. Automatically showing upcoming events in your area, the site offers personalized recommendations based on your preferences and your Facebook friends’ favorited music. Covering over 220,000 events globally, the site uses Algolia to offer meaningful results for visitors who are also looking for something different.

Founder Vit Myshlaev admits that concert sites often share the same pool of information. The differentiator is how that information is presented. “The biggest advantage one can have is user experience,” he explains. “There’s information out there, but do users find it? The reason that people don’t go to cool concerts is that they still don’t know about them!”

As an example, he showed me one of the largest live music discovery sites on the web. Searching for an artist required navigating a convoluted maze of links before pulling up irrelevant results. “Users have to type in queries without autocomplete, typo-tolerance, or internationalization. They have to scroll through a long list of answers and click on paginated links. That’s not what people want in 2014,” said Myshlaev.

To simplify search and make the results more relevant, Concertwith.me used our API. “We got a lot of user feedback for natural search,” Myshlaev wrote. Now visitors can search for artists and concerts instantly. With large user bases in the United States, Germany, France, Spain, Italy, Russia and Poland, Concertwith.me also benefits from Algolia’s multi-lingual search feature. “We’ve localized our app to many countries. For example, you can search in Russian or for artists that are Russian, and results will still come up,” says Myshlaev.


For users with a less targeted idea of what they’re looking for, Concertwith.me implemented structured search via faceting. “We also realized that some visitors don’t know what they want. Algolia search helps them find answers to questions like, Where will my favorite artist perform? How much do tickets cost? Are there any upcoming shows?”


Concertwith.me’s goal is to reduce informational noise so that users can find and discover music as soon as possible. The start up experimented with a number of other search technologies before reading an article about us on Intercom.io, which inspired Myshlaev. “When I saw what Algolia could do, I knew that this was the competitive edge I was looking for.”

Want to build a search bar with multi-category auto-completion like Concertwith.me? Learn how through our tutorial.

How Abacus Leverages Algolia for Realtime Expense Reporting

When one thinks of expense reporting, speed is far from the first descriptor that comes to mind. Companies spend a substantial amount of time tracking expenses, while employees linger in paperwork purgatory, wondering when they will be reimbursed for their work-related charges. That’s why Abacus has made it their mission to simplify expense management so that it occurs in real time. Their creative implementation of Algolia helps make it happen.

Abacus is a mobile and desktop application that allows small businesses to track and verify expenses on the go. Employees can upload a photo of their receipt on the mobile app, and Abacus takes care of the rest. “For each expense, we have a lot of data. We have the person who expensed it, the amount of the expense, the time, and where it took place. We also have a lot of metadata. For example, if you went out to breakfast, we pull in the name of the restaurant, the address, the URL of the merchant. There’s tags and categories and so on,” explains Ted Power, Co-Founder of Abacus. “And we wanted to make all of that searchable.”

Abacus Algolia

To make all of that data accessible and interpretable for a financial manager, Abacus turned to our API. “Algolia made it super easy for us to get faceted, advanced search options. If you are the finance person at your company, you can basically say ‘Show me all of the expenses over $50,’ or ‘Show me all the expenses that don’t have a receipt.’ You can look at expenses for one person or one category, like travel. You can even pivot off of 8 of these different things. Algolia makes it super easy to do,” says Power. This accelerates the process of expense verification and approval. “It’s good search. We have tags like ‘car rental’ on auto-complete, for example. That’s all Algolia.”

Power adds that a “great implementation experience” was especially beneficial for the start up. “It’s the kind of thing that would have taken ages to build from scratch.” Co-Founder Joshua Halickman chimed in: “Being able to get up and off the ground really quickly was great. In general, I love the speed. Crazy fast. Really nice.”

Abacus Algolia

Images courtesy of Abacus. Learn more on their website.

Deploying Algolia to Search on more than 2 Million Products

The following post is an interview of Vincent Paulin, R&D Manager at A Little Market (recently acquired by Etsy).

As a fast growing ecommerce site for handmade goods in France, A Little Market has seen its marketplace grow from a few thousand to over 2 million products in just 5 years. With 90,000 designers and artisans using A Little Market marketplace to buy, sell and collaborate, search quickly became a major part of their ecommerce strategy and user experience.


What did you have in place as a search solution?

“We implemented a Solr based search 5 years ago and had been trying to tweak it to fit our growing needs.  We had selected this system for its flexibility, however, over time, that flexibility translated into constant maintenance, modifications and lower relevance in our search results.

Then we investigated Elasticsearch. It is complex, yet powerful. As I was diving deeper into Elasticsearch I realized that I could quickly gain an “ok” search experience; however, a powerful search experience would mean investing more time than we had to configure it. Then I did a little math:  learning the platform would take a few weeks, configuring servers – a few days, and configuring and tuning semantic search perfectly – several months.

Then we found Algolia.  We only had 3 months and knew Algolia would be much easier to implement, so we A/B tested everything to see how it would impact the search experience.

Can you tell us more about your integration process?

The first thing we wanted to get done was to reference all the shops and our best searches to make an autosuggest widget. Building this autosuggest with a basic configuration took us 2 days.

Then we built an automatic task to aggregate shops and best searches every day and configure Algolia indices. We also took on the task to create the front javascript plugin. With the Algolia documentation and the examples on Github it took us less than 1 hour.

The results of this first test were very encouraging.  With around 500k requests per day, the response time was about 4 milliseconds on average and we saw the conversion rate multiplied by 3 compared to the previous conversion rate with a search bar with “no suggest”. For A Little Mercerie, another marketplace we manage, the improvement was about 4 times greater.

After this first test, we were ready to fully commit to Algolia for our whole search experience. The first step was to create a script to index our entire product database in Algolia. This was easy to do with batch insert in Algolia indices. We selected some attributes of our products such as the title, categories, materials and colors to be indexed. That was a first try. We wanted it to be quick and simple.

With the help of the open source demo code we developed a full JS sandbox which can display paginated results with faceting to show the progress to the team.  In less than a week, we had a fully working sandbox and the results were promising.  Our query time averaged less than 20 milliseconds on 2 millions records.  With confidence we started to upgrade the algorithm on Algolia, test it, again and again, adding some attributes to index such as specific events (christmas, valentine’s day), custom tags, etc.

In addition, we implemented sorted results. They are really relevant with the new numeric ranking option in settings. At that step we were able to sort results by price, date, etc. You must create a specific index for each specific ranking you need.  We also created a different index for each language (French and Italian) and took this opportunity to do the same across our  other websites, alittlemercerie.com and alittleepicerie.com.

To do this we created a custom API which abstracts the use of any kind of search engine for all API clients. We end up losing the real-time search but we need that for now in order to abstract everything and to collect data before sending the results.

The next step was to erase the “no results” pages. For that, we were progressively adding the last words of the query as optional words until we had somes results.We never set as optional all the user queries.  We set at least the first word or the first two words.

When search was ready, we still had plenty of time left to implement it on our clients’ applications. We took more time than was needed to implement Algolia. The speed of iteration with the Algolia API enables us to test everything in a much shorter timeframe.

How has Algolia’s API helped search on A Little Market?

We are now able to answer more than 500/1000 requests per minute and we add 6000 new products every day to the search engine while over 3000 are removed, in real time.

After this integration of the Algolia API, we saw an increase in our conversion rate on search by 10%. This represents tens thousands of euros in turnover per month for us. In a few weeks of work with one engineer, we had replaced our main search engine for a better solution thanks to Algolia.”

Keeping Data in your Search Engine Up-to-Date

When we developed the first version of Algolia Search, we put a lot of effort into developing a data update API. It worked like this: You could send us a modified version of your data as soon as the change appeared, even if it concerned only a specific part of a record. For example, this batch of information could be the updated price or number of reviews, and we would only update this specific attribute in your index.

However, this initial plan did not take into account that most of our big customers would not benefit from this API due to their existing infrastructure. If you had not planned to catch all updates in your architecture, or if you were not using a framework like Ruby on Rails, it could be very difficult to even have a notification for any of these updates. The solution in this case was to use a batch update on a regular basis. It was a good method to use if you didn’t want to change a single line of code in your existing infrastructure, but the batch update was far from a cure-all.

The problem of batch update

There are two main ways to perform a batch update on a regular basis:

  1. Scan your database and update all objects. This method is good if you have no delete operation, but if some data are removed from your database, you will need to perform an extra check to handle delete, which can be very slow.
  2. Clear the content of the index and import all your objects. With this method, you ensure that your index is well synchronized with your database. However, if you receive queries during the import, you will return partial results.  If interrupted, the whole rescan could break your relevance or your service.

So the two approaches are somewhat buggy and dangerous.

Another approach: build a new index with another name

Since our API allows the creation of a new index with a different name, you could have made your batch import in a new index. Afterward, you would just need to update your front end to send queries to this new index.
Since all indexing jobs are done asynchronously, we first need to check that an indexing job is finished. In order to do that, we return an integer (called TaskID) that allows you to check if an update job is applied. Thus, you just have to use the API to check that the job is indexed.
But then a problem arises with mobile applications: You cannot change the index name of an application as easily, since most of the time, it is a constant in the application code. And even for a website, it means that the batch will need to inform your frontend that the index name is different. This can be complex.

The elegant solution: move operation

To solve these problems, we implemented a command that is well known on file systems: move. You can move your new index on the old one, and this will atomically update the content of the old index with the content of the new one. With this new approach, you can solve all the previous update problems with one simple procedure. Here’s how you would update an index called “MyIndex”:

  1. Initialize an index “MyIndex.tmp”
  2. Scan your database and import all your data in “MyIndex.tmp”
  3. Move “MyIndex.tmp in “MyIndex”

You don’t have to do any modification on your backend to catch modifications, nor do you need to change the index name on the frontend. Even better, you don’t need to check the indexing status with our TaskID system since the “move” operation will simply be queued after all “adds”. All queries will go to the new index when it is ready.

The beauty of the move command

This command is so elegant that even customers who had been sending us realtime updates via our updates API have decided to use this batch update on a regular basis. The move command is a good way to ensure that there are no bugs in your update code, nor divergence between your database and Algolia.

This operation is supported in our twelve API Clients. We go even further in our Ruby on Rails integration: You need only use the ‘reindex’ command (introduced in 1.10.5) to automatically build a new temporary index and move it on top of the existing one.

The move command is an example of how we try to simplify the life of developers. If you see any other way we can help you, let us know and we’ll do our best to remove your pain!

Common Misperceptions about Search as a Service

Since the first SaaS IPO by salesforce.com, the SaaS (Software as a Service) model has boomed in the last decade to become a global market that is worth billions today. It has taken a long way and a lot of evangelisation to get there.

Before salesforce.com and the other SaaS pioneers succeeded at making SaaS a standard model, the IT departments were clear: the infrastructure as well as the whole stack had to be behind their walls. Since then, mindsets have shifted with the cloud revolution, and you can now find several softwares such as Box, Jive or Workday used by a lot of Fortune 500 companies and millions of SMBs and startups.

Everything is now going SaaS, even core product components such as internal search. This new generation of SaaS products is facing the same misperceptions their peers faced years ago. So today, we wanted to dig into the misperceptions about search as a service in general.

Hosting your search is way more complex and expensive than you may think

Some people prefer to go on-premises as they only pay for the raw resource, especially if they choose to run open source software on it. By doing this, they believe they can skip the margin layer in the price of the SaaS solutions. The problem is that this view highly under-estimates the Total Cost of Ownership (TCO) of the final solution.

Here are some reasons why hosting your own search engine can get extremely complex & expensive:

Hardware selection

A search engine has the particularity of being very IO (indexing), RAM (search) and CPU (indexing + search) intensive. If you want to host it yourself, you need to make sure your hardware is well sized for the kind of search you will be handling. We often see companies that run on under-sized EC2 instances to host their search engine are simply unable to add more resource-consuming features (faceting, spellchecking, auto-completion). Selecting the right instance is more difficult than it seems, and you’ll need to review your copy if your dataset, feature list or queries per second (QPS) change. Elasticity is not only about adding more servers, but is also about being able to add end-users features. Each Algolia cluster is backed by 3 high-end bare metal servers with at least the following hardware configuration:

  • CPU: Intel Xeon (E5-1650v2) 6c/12t 3,5 GHz+/3,9 GHz+
  • RAM: 128GB DDR3 ECC 1600MHz
  • Disk:  1.2TB  SSD (via 3 or 4 high-durability SSD disks in RAID-0)

This configuration is key to provide instant and realtime search, answering queries in <10ms.

Server configuration

It is a general perception of many technical people that server configuration is easy: after all it should just be a matter of selecting the right EC2 Amazon Machine Image (AMI) + a puppet/chef configuration, right? Unfortunately, this isn’t the case for a search engine. Nearly all AMIs contain standard kernel settings that are okay if you have low traffic, but a nightmare as soon as your traffic gets heavier. We’ve been working with search engines for the last 10 years, and we still discover kernel/hardware corner cases every month! To give you a taste of some heavyweight issues you’ll encounter, check out the following bullet points:

  • IO: Default kernel settings are NOT optimized for SSDs!!! For example, Linux’s I/O scheduler is configured to merge some I/Os to reduce the hard-drive latency while seeking the disk sectors: non-sense on SSD and slowing the overall server performance.
  • Memory: The kernel caches a lot, and that’s cool… most of the time. When you write data on the disk, it will actually be written in the RAM and flushed to disk later by the pdflush process. There are some advanced kernel parameters that allow configuration. vm.dirty_background_ratio is one of them: it configures the maximum percentage of memory that can be “dirty” (in cache) before it is written on the disk.  In other words, if you have 128GB of RAM, and you are using the default value of 10% for dirty_background_ratio, the system will only flush the cache when it reaches 12GB!!!! Flushing such bursts of writes will slow down your entire system (even on SSD), killing the speed of all searches & reads. Read more.
  • Network:  When calling the  listen function in BSD and POSIX sockets, an argument called the backlog is accepted. The backlog argument defines the maximum length of the queue of pending connections for sockfd. If the backlog argument is higher than the value in net.core.somaxconn, it is silently truncated to that value. The default value is 128 which is way too low! If a connection request arrives when the queue is full, the client may receive an error with an indication of ECONNREFUSED. Read more & even more.

We’ve been working hard to fine-tune such settings and it has allowed us to handle today several thousands of search operations per second on one server.

Deployment & upgrades are complex

Upgrading software is one of the main reasons of service outages. It should be fully automated and capable of rolling back in case of a deployment failure. If you want to have a safe deployment, you would also need a pre-production setup that duplicates your production’s setup to validate a new deployment, as well as an A/B test with a part of your traffic. Obviously, such setup requires additional servers. At Algolia, we have test and pre-production servers allowing us to validate every deployment before upgrading your production cluster. Each time a feature is added or a bug is fixed on the engine, all of our clusters are updated so that everyone benefits from the upgrade.

Toolbox vs features

On-premises solutions were not built to be exposed as a public service: you always need to build extra layers on top of it. And even if these solutions have plenty of APIs and low-level features, turning them into end-user features requires time, resources and a lot of engineering (more than just a full-stack developer!). You may need to re-develop:

  • Auto-completion: to suggest best products/queries directly from the search bar while handling security & business filters (not only suggesting popular entries);
  • Instant-Faceting: to provide realtime faceting refreshed at each keystroke;
  • Multi-datacenter replication: synchronize your data across multiple instances and route the queries to the right datacenter to ensure the best search performance all around the world;
  • Queries analytics: to get valuable information on what and how people search;
  • Monitoring: To track in realtime the state of your servers, the storage you use, the available memory, the performance of your service, etc.

On-premises is not as secure as one might think

Securing a search engine is very complex and if you chose to do it yourself, you will face three main challenges:

  1.  Controlling who can access your data: You probably have a model that requires permissions associated with your content. Search as a service providers offer packaged features to handle user based restrictions. For example you can generate an API Key that can only target specific indexes. Most on-premise search engines do not provide any access control feature.
  2. Protecting yourself against attacks: There are various attacks that your service can suffer from (denial of service, buffer overflow, access control weakness, code injection, etc.). API SaaS providers put a lot of effort into having the best possible security. For example API providers reacted the most quickly to the “HeartBleed” SSL vulnerability; It only took a few hours after disclosure for Twilio, Firebase and Algolia to fix the issue.
  3. Protecting yourself from unwarranted downloads: The search feature of your website can easily expose a way to grab all your data. Search as a service providers offer packaged features to help prevent this problem (rate limit, time-limited API Key, user-restricted API Key, etc.).

Mastering these three areas is difficult, and API providers are challenged every day by their customers to provide a state-of-the-art level of security in all of them. Reaching the same level of security with an on-premise solution would simply require too much investment.

Search as a service is not reserved to simple use cases

People tend to believe that search as a service is only good for basic use cases, which prevents developers from implementing fully featured search experiences. The fact of the matter is that search as a service simply handles all of the heavy lifting while keeping the flexibility to easily configure the engine. Therefore it enables any developers, even front-end only developers, to build complex instant search implementation with filters, faceting or geo-search. For instance, feel free to take a look at JadoPado, a customer who developed a fully featured instant search for their e-commerce store. Because your solution runs inside your walls once in production,  you will need a dedicated team to constantly track and fix the multiple issues you will encounter. Who would think of having a team dedicated to ensuring their CRM software works fine? It makes no sense if you use a SaaS software like most people do today. Why should it make more sense for components such as search? All the heavy lifting and the operational costs are now concentrated in the SaaS providers’ hands, making it eventually way more cost-efficient for you..

A New Way to Handle Synonyms in a Search Engine

We recently added the support for Synonyms in Algolia! It has been the most requested feature in Algolia since our launch in September. While it may seem simple, it actually took us some time to implement because we wanted to do it in a different way than classic search engines.

What’s wrong with synonyms

There are two main problems with how existing search engines handle synonyms. These issues disturb the user experience and could make them think “this search engine is buggy”.


In most search engines, synonyms are not compatible with typeahead search. For example, if you want  tablet  to equal   ipad in a query, the prefix search for t , ta , tab , tabl  & table  will not trigger the expansion on iPad ; Only the  tablet query will. Thus, a single new letter in the search bar could totally change the result set, catching users off-guard.


Highlighting matched text is a key element of the user experience, especially when the search engine tolerates typos. This is the difference between making users think “I don’t understand this result” and “This engine was able to understand my errors”. Synonym expansions are rarely highlighted, which breaks the trust of the users in the search results and can feel like a bug.

Our implementation

We have identified two different use cases for synonyms: equalities and placeholders. The first and most common use case is when you tell the search engine that several words must be considered equal, for example  st and  street in an address. The second use case, which we call a placeholder, is when you indicate that a specific token can be replaced by a set of possible words and that the token itself is not searchable. For example, the content  <number> street could be matched by the queries  1st street or  2nd street but not the query number street.

For the first use case, we have added a support of synonyms that is compatible with prefix search and have implemented two different ways to do highlighting (controlled by the replaceSynonymsInHighlight  query parameter):

  1. A mode where the original word that matched via a synonym is highlighted. For example if you have a record that contains black ipad 64GB  and a synonym  black equals  dark, then the following queries will fully highlight the  black word : ipad d , ipad da , ipad dar &  ipad dark. The typeahead search is working and the synonym expansion is fully highlighted: black ipad 64GB .
  2. A mode where the original word is replaced by the synonym, and the matched prefix is highlighted. For example  ipad d  query will replace  black by  dark and will highlight the first letter of  dark: dark ipad 64GB. This method allows to fully explain the results when the original word can be safely replaced by the matched synonym.

For the second use case, we have added support for placeholders. You can add a specific token in your records that will be safely replaced by a set of words defined in your configuration. The highlighting mode that replaces the original word by the expansion totally makes sense here. For example if you have <streetnumber> mission street  record with a placeholder <streetnumber> = [ "1st", "2nd", ....] , then the query  1st missionstreet will replace <number> by  1st  and will highlight all words: 1st mission street.

We believe this is a better way to handle synonyms and we hope you will like it :) We would love to get your feedback and ideas for improvement on this feature! Feel free to contact us at hey(at)algolia.com.

Why JSONP is still Mandatory

At Algolia, we are convinced that search queries need to be sent directly from the browser (or mobile app) to the search-engine in order to have a realtime search experience. This is why we have developed a search backend that replies within a few milliseconds through an API that handles security when called from the browser.

Cross domain requests

For security reasons, the default behavior of a web browser is to block all queries that are going to a domain that is different from the website they are sent from. So when using an external HTTP-based search API, all your queries should be blocked because they are sent to an external domain. There are two methods to call an external API from the browser:


The JSONP approach is a workaround that consists of calling an external API  with a DOM  <script>  tag. The  <script> tag is allowed to load content from any domains without security restrictions. The targeted API needs to expose a HTTP GET endpoint and return Javascript code instead of the regular JSON data. You can use this jQuery code to dynamically call a JSONP URL:

In order to retrieve the API answer from the newly included JavaScript code, jQuery automatically appends a callback argument to your URL (for example &callback=method12 ) which must be called by the JavaScript code that your API generates.

This is what a regular JSON reply would look like: 

Instead, the JSONP-compliant API generates:

Cross Origin Resource Sharing

CORS (Cross Origin Resource Sharing) is the proper approach to perform a call to an external domain. If the remote API is CORS-compliant, you can use a regular  XMLHttpRequest  JavaScript object to perform the API call. In practice the browser will first perform an HTTP OPTIONS request to the remote API to check which caller domains are allowed and if it is authorized to execute the requested URL.

For example here is a CORS request issued by a browser. The most important lines are the last two headers that specify which permissions are checked. In this case, the method is POST and the three specific HTTP headers that are requested.

The server reply will be similar to this one:

This answer indicates that this POST method can be called from any domain ( Access-Control-Allow-Origin: * ) and with the requested headers.

CORS has many advantages. First, it allows access to a real REST API with all HTTP verbs (mainly GET, POST, PUT, DELETE) and it also allows to better handle errors in an API (bad requests, object not found, …). The major drawback is that it is only supported by modern browsers (Internet Explorer ≥ 10, Firefox ≥ 3.5, Chrome ≥ 3, Safari ≥ 4 & Opera ≥ 12; Internet Explorer 8 & 9 provides partial support via the XDomainRequest  object).

Our initial conclusion

Because of the advantages of CORS in terms of error handling, we started with a CORS implementation of our API. We also added a specific support for Internet Explorer 8 & 9 using the   XDomainRequest  JavaScript object (they do not support  XMLHttpRequest). The main difference is that  XDomainRequest  does not support HTTP headers so we added another way to specify user credentials in the body of the POST request (it was initially only supported via HTTP headers).

We were confident that we were supporting almost all browsers with this implementation, as only very old browsers could cause problems. But we were wrong!

CORS problems

The reality is that CORS still causes problems, even with modern browsers. The biggest problem we have found was with some firewalls/proxies that refuse HTTP OPTIONS queries. We even found software on some computers that were blocking CORS requests, as the Cisco AnyConnect VPN client, which is widely used in the enterprise world. We have found this issue when a TechCrunch employee was not able to operate search on crunchbase.com because the AnyConnect VPN client was installed on his laptop.

Even in 2014 with a large majority of browsers supporting CORS, it is not possible to have perfect service quality with a CORS-enabled REST API!

The solution

Using JSONP is the only solution to ensure great compatibility with old browsers and handle problems with a misconfigured firewall/proxy. However, CORS offers the advantage of proper error-handling, so we do not want to limit ourselves to JSONP.

In the latest version of our JavaScript client, we decided to use CORS with a fallback on JSONP. At client initialization time, we check if the browser supports CORS and then perform an OPTIONS query to check that there is no firewall/proxy that blocks CORS requests. If there is any error we fallback on JSONP. All this logic is available in our JavaScript client without any API/code change for our customers.

Having CORS support with automatic fallback on JSONP is the best way we have found to ensure great service quality and to support all corner case scenarios. If you see any other way to do it, your feedback is very welcome.