< Back

Realtime Search: Security and our Javascript Client

Edit: As suggested on Hacker News, SHA256 is not secure, as it allows a length extension attack. We have replaced it with HMAC-SHA256.

Instant is in our DNA, so our first priority was to build a search backend that would be able to return relevant realtime search results in a few milliseconds. However, the backend is just one variable in our realtime equation. The response time perceived by the end user is the total lapse of time between their first keystroke and the final display of their results. Thus, with an extremely fast backend, solving this equation comes down to optimising network latency. This is an issue we solve in two steps:

 

  • Second, to keep reducing this perceived latency, queries must be sent directly from the end users’ browsers or mobile phones to our servers. To avoid intermediaries like your own servers, we offer a JavaScript client for websites and ObjC/Android/C# clients for mobile apps.

The security challenge of JavaScript

Using this client means that you need to include an API key in your JavaScript (or mobile app) code. The first security issue with this approach is that this key can be easily retrieved by anyone who simply looks at the code of the page. This gives that person the potential to modify the content behind the website/mobile application! To fix this problem, we provide search-only API keys which protect your indexes from unauthorized modifications.

This was a first step and we’ve quickly had to solve two other security issues:

  • Limiting the ability to crawl your data: you may not want people to get all your data by continuous querying. The simple solution was to limit the number of API calls a user could perform in a given period of time. We implemented this by setting a rate limit per IP address. However, this approach is not acceptable if a lot of users are behind a global firewall, thus sharing one IP address. This is very likely for our corporate users.
  • Securing access control:  you may need to restrict the queries of a user to specific content. For example, you may have power users who should get access to more content than “regular” users. The easy way to do it is by using filters. The problem here with simple filters in your JavaScript code is that people can figure out how to modify these filters and get access to content they are not be supposed to see.

How we solve it altogether

Today, most websites and applications require people to create an account and log in to access a personalized experience (think of CRM applications, Facebook or even Netflix). We decided to use these user IDs to solve these two issues by creating signed API keys. Let’s say you have an API key with search only permission and want to apply a filter on two groups of content (public OR power_users_only) for a specific user (id=42):

api_key=20ffce3fdbf036db955d67645bb2c993
query_filters=(public,power_users_only)
user_token=42

You can generate a secured API key in your backend that is defined by a hash (HMAC SHA 256) of three elements:

secured_api_key=HMAC_SHA_256(api_key, query_filters + user_token)
secured_api_key=HMAC_SHA_256("20ffce3fdbf036db955d67645bb2c993", "(public,power_users_only)" + "42")
secured_api_key="3abb95c273455ce9b57c61ee5258ba44093f17022dd4bfb39a37e56bee7d24a5"

For example, if you are using rails, the code in your backend would be:

secured_key = Algolia.generate_secured_api_key('20ffce3fdbf036db955d67645bb2c993', 
                                               '(public,power_users_only)', '42')

You can then initialize your JavaScript code with the secured API key and associated information:

var algolia = new AlgoliaSearch('YourApplicationID', '3abb95c273455ce9b57c61ee5258ba44093f17022dd4bfb39a37e56bee7d24a5'); algolia.setSecurityTags('(public,power_users_only)'); algolia.setUserToken('42'); algolia.initIndex('YourIndex').search($('#q').val(), function(success, content) { // [...] });

The user identifier (defined by SetUserToken) is used instead of the IP address for the rate limit and the security filters (defined by SetSecurityTags) are automatically applied to the query.

In practice, if a user wants to overstep her rights, she will need to modify her security tags and figure out the new hash. Our backend checks if a query is legit by computing all the possible hashes using all your available API keys for the queried index, as well as the security tags defined in the query and the user identifier (if set).  If there is no match between the hash of the query and the ones we computed, we will return a permission denied (403). Don’t worry, reverse-engineering the original API key using brute-force would require years and thousands of core.

You may want to apply security filters without limiting the rate of queries, so if you don’t need both of these features, you can use only one.

We launched this new feature a few weeks ago and we have received very good feedback so far. Our customers don’t need to choose anymore between security and realtime search. If you see any way to improve this approach, we would love to hear your feedback!

  • AreaMan

    If I can get a hack into the web browser of a highly-authorized algolia user, I could grab their credentials and have their browser send the credentials to my server, right?

    Or better yet, have the highly-authorized user perform my query and send me the results.

    • jackowayed

      If you can get a hack into the web browser of a user, it’s game over, you can get their login credentials/cookie and do whatever you want, regardless of what Algolia does. So that issue is outside the “threat model” that it’s reasonable for Algolia to design for.

    • In that case you will be able to query user data (but only perform search). The risk is the same that having access to an authenticated account of that user (mail, facebook, …).

  • Concerned Reader
    • You are right, we have updated the blog post and our code according to that attack.

  • Terry A Davis

    Hash function might be adding ASCII letters.
    “BAA”=2+1+1=4
    “DD”=4+4=8
    “CA”=3+1=4
    You can use hash functions to collect only unique words.
    You stupid fucken niggers.

  • Stephane

    What prevent an attacker to create many user identifiers?

    • The question is how do you do that? The HMAC_SHA256 introduce the need of brute-force to have different user identifiers

      • Stephane

        Sorry I misread.. The secured API key is generated on the backend so my question is stupid 🙂

  • What about revocability?

    You might want to add a emission date and hash it along the other information, to address it partially…

    secured_api_key=Hash(api_key, query_filters + user_token, utc_now())

    • In fact we expose an API to create API keys (secrets in HMAC-SHA256). We support ephemeral API keys (http://www.algolia.com/doc#Security), so we already address this use case 🙂
      The standard use of secured API keys is to have an ephemeral API key with an expiration date few days in the future and to generate a new API key every day.

  • PLNech

    An error prevents the last example from being displayed, as its code is inside a tag: