A Blog About Programming, Search & User Experience

Dealing with OpenSSL Heartbleed Vulnerability

Yesterday, the OpenSSL project released an update to fix a serious security issue. This vulnerability was disclosed in CVE-2014-0160 and is more widely known as the Heartbleed vulnerability. It allows an attacker to grab the content in memory on a server. Given the widespread use of OpenSSL and the versions affected, this vulnerability affects a large percentage of services on the internet.

Once the exploit was revealed, we responded immediately and all Algolia services were secured the same day, by 3pm PDT on Monday, April 7th. The fix was applied on all our API servers and our website. We then generated new SSL certificates with a new private key.

Our website is also dependent on Amazon Elastic Load Balancer that was affected by this issue and updated later on Tuesday, April 8th. We then changed the website certificate.

All Algolia servers are no longer exposed to this vulnerability.

Your credentials

We took the time to analyze the past activity on our servers and did not find any suspicious activity. We are confident that no credential has leaked. However, given that this exploit existed in the wild for such a long time, it is possible that an attacker could have stolen API keys or passwords without our knowledge. As a result, we recommend that all Algolia users change the passwords on their accounts. We also recommend that you reset your Algolia administration API key, which you can do at the bottom of the “Credential” section in your dashboard. Be careful to update it everywhere you use it in your code (once you have patched your SSL library if you too are vulnerable).

Security at Algolia

The safety and security of our customer data are our highest priorities. We are continuing to monitor the situation and will respond rapidly to any other potential threats that may be discovered.

If you have any questions or concerns, please email us directly at security@algolia.com

How To Better Know Your Users: Introducing Algolia’s Search Analytics

This week we have released a much requested feature by our customers: analytics. We describe here some of the top features that are now available to all our users!

Why should you care?

The Web is the new application platform and at Algolia, we hope to revolutionize the way people search and access content inside these Web and mobile services. Think about Spotify, LinkedIn, Amazon: everyone wants to find the right songs, people and products in just a couple keystrokes. Fast and meaningful access to all this content via a simple search box: that is the challenge we are tackling. In March, we answered for our customers on all the continents more than 200 million user queries.

For our customers, providing the right content through the right search and browsing experience is key. Understanding their users, what they like, what they want and when they want it is as important, if not more. This is why we came up with this new analytics section, built on top of our API and available on our customers’ online dashboards when they log in to their Algolia account. So what exactly do we track for you?

Most popular queries

In this chart we show what items are the most searched for. It would be useful to a procurement department for anticipating the needs for inventory on the top searched products. And if you monetize your service through advertising, think about the value of knowing what people are the most interested in.

Most popular queries

 

Queries with no or a few results

Today, most services are simply clueless when it comes to what is missing in their content base. How do you know that your catalogue of products fits your users’ expectations? Business-wise, knowing that you do not provide what your users need is probably of critical importance.

Top queries with a few or no results

 

How does a query evolve over time?

Is Chanel more popular than Louis Vuitton in the morning or at night? Are bikes more popular in June or in December? With this new feature, you can now answer such questions for your own content by following the number of times a specific query is typed on an hourly basis.

Evolution of the query “louboutin” over 24 hours

 

What are the categories people search the most?

When users type in a query, they often use categories to refine the results. We now let you know what are most used categories for refinement. We even provide the most used combinations of categories (such as “dress“ + “blue” + “size M”). It should help you understand how your users browse your content and more broadly if the ergonomics of your app is optimized.

Top categories

 

These new analytics features are included in our existing plans at no extra cost. We simply limit the number of days of analytics based on the plan you choose. We hope you will like it and we will be more than happy to read your feedback and feature requests!

 

On HipChat’s blog: Algolia extends HipChat to customer support

As you may probably know, we’re using HipChat to build our live-help chat. If you want to know more, go ahead and read our guest post on HipChat’s blog.

Live chat

On Leanstack: The tech stack behind Algolia’s realtime search service

leanstackWe recently sat down with Yonas, founder of Leanstack, to talk about our tech stack. The interview also includes a lot of details about how we got started and what makes our technology different. Check it out on leanstack blog!

I highly recommend signing up to get leanstack email updates. They are full of great information about new developer tools and cloud services!

Realtime Search: How we ensure security with our JavaScript client

Edit: As suggested on Hacker News, SHA256 is not secured as it allows Length extension attack, we have replaced it by HMAC-SHA256.

Realtime is in our DNA so our first priority when we created Algolia was to build a search backend that would be able to return relevant results in a few milliseconds. However, the backend part is only one of the variables of the realtime equation. Indeed, the response time perceived by the end users is the total lapse of time between their keystroke and the display of the results. Then, with an extremely fast search backend, solving this equation comes down to tackling the network latency. This is an issue we solve in two steps:

  • Then, to keep reducing this perceived latency, queries must be sent directly from the end users’ browsers or mobile phones to our servers and avoid any intermediary, such as your own servers. This is why we offer a JavaScript client for websites and ObjC/Android/C# clients for mobile apps.

The security challenge of JavaScript

Using this client means that you need to include an API key in your JavaScript (or mobile app) code. The first security issue with this approach is that this key can be easily retrieved by anyone by simply looking at the code of the page, potentially allowing this user to modify the content behind the website/mobile application. To fix this problem, we started to provide search-only API keys which protect your indexes from unauthorized modifications.

This was a first step and we’ve quickly had to solve two other security issues:

  • Limiting the ability to crawl your data: you may not want people to get all your data by simply querying it repetitively. The simple solution was to limit the number of API calls a user could perform in a given period of time. We did implement this with a rate limit per IP address. However, this approach is not acceptable if a lot of users are behind a global firewall (behind one IP address). This is very likely to happen for our corporate users.
  • Securing access control:  you may need to restrict the queries of a user to specific content. For example, you may have power users who should get access to more content than “regular” users. The easy way to do it is by using filters. The problem here with simple filters in your JavaScript code is that people can figure out how to modify these filters and get access to content they are not be supposed to see.

How we solve it altogether

Today, most websites and applications require people to create an account and log in in oder to get a personalized experience (think of CRM applications, Facebook or even Netflix). We decided to use these user IDs to solve these two issues by creating signed API keys. Let’s say you have an API key with search only permission and want to apply a filter on two groups of content (public OR power_users_only) for a specific user (id=42):

You can generate in your backend a secured API key that is defined by a hash (HMAC SHA 256) of three elements:

For example, if you are using rails, the code in your backend would be:

You can then initialize your JavaScript code with the secured API key and associated information:

The user identifier (defined by SetUserToken) is used instead of the IP address for the rate limit and the security filters (defined by SetSecurityTags) are automatically applied to the query.

In practice, if a user wants to overstep her rights, she will need to modify her security tags and figure out the new hash. Our backend checks if a query is legit by computing all the possible hashes using all your available API keys for the queried index as well as the security tags defined in the query and the user identifier (if set).  If there is no match between the hash of the query and the ones we computed, we will return a permission denied (403). Don’t worry, reverse-engineering the original API key using brute-force would require years and thousands of core.

You may want to apply security filters but not care about limiting the rate of queries so if you don’t need both of these features, you can use only one.

We launched this new feature a few weeks ago and we have received very good feedback so far. Our customers don’t need anymore to choose between security and speed of search. If you see any way to improve this approach, we would love to hear your feedback!

What Caused Today’s Performance Issues In Europe And Why It Will Not Happen Again

During a few hours on March 17th you may have noticed longer response times for some of the queries sent by your users.

Average latency for one of our European clusters on March 17th

As you can see above, our slowest average response time (from the user’s browser to our servers and back to the user’s browser) on one of our European clusters has peaked at 858ms. On a normal day, this peak is usually no higher than 55ms.

This was clearly not a normal behavior for our API so we investigated.

How indexing and search calls share the resource

Each cluster handles two kinds of calls on our REST API: the ones to build and modify the indexes (Writes) and the ones to answer users’ queries (Search). The resources of each cluster are shared between these two uses. As Write operations are far more expensive than Search calls, we designed our API so that indexing should never use more than 10% of these resources.

Getting into more details, up until now, we used to set a limitation on the rate of Writes per HTTP connection, while allowing each Write to include a batch of operations (such as adding 1M products to an index on a single network call). There is no such limit for queries (Search calls), we simply need to limit Writes calls to keep a perfect quality of service for search. We highly recommend to batch the operations per Write (up to 1GB of operations per batch) rather than sending them one by one, as the Write rate limit may be reached quickly: here lies the origin of yesterday’s issues.

What happened yesterday is that on one of our European clusters, one customer pushed so many unbatched indexing calls from different HTTP connections that they massively outnumbered the search calls of the other users of the cluster.

It eventually slowed down the average response time for the queries on this cluster.

The Solution

As of today, we now  set the rate limit of Writes per account and not anymore per HTTP connection. It prevents anyone from using multiple connections to bypass this Write rate limit. This also implies that customers who want to push a lot of operations in a short time simply need to send their calls in batches.

How to do this? This is already in our documentation. See here for instance for our Ruby client: https://github.com/algolia/algoliasearch-client-ruby#batch-writes

Algolia Heroku add-on enters GA

heroku-Logo-1

We launched the first beta of our Heroku add-on in October 2013 and are now happy to announce its general availability!

During the beta period we received excellent feedback (and some bug reports!) that helped us improve our integration. We are now fully ready to serve production on both Heroku datacenters. If you were part of our beta program, we will contact you shortly to invite you to migrate to a standard plan.

You can directly install it from our Heroku add-on page and as ever, please let us know if you have any feedback!

Algolia is now available in Asia, here is why

Screen Shot 2014-03-13 at 17.51.50

One of the terrific advantages of building a SaaS company is that your clients can be anywhere in the world! We now have customers in more than 15 different countries distributed in South America, Europe, Africa, and of course North America. We feel incredibly lucky to get so many international customers trusting us with their search.

Language support is one of the key that enabled us to enter these markets. Since the beginning, we wanted to support every language used on internet. To back our vision with action, we developed over time a very good support of asian languages. We are for example able to automatically retrieve results in Traditional Chinese when the query is in Simplified Chinese (or vice-versa). You simply need to add objects in Chinese, Japanese or Korean and we handle the language processing for you.

Despite the fact we had a good processing of Asian languages, we didn’t plan to open an Asian datacenter so early, mainly because we thought the API as a service market was less mature in Asia than it is in the US or Europe. But we were surprised when an article on 36kr.com gave us dozen of signups from China. We got more signups from China in the past month than from Canada!

One of our core values is the speed of our search engine. To provide a realtime search experience, we want the response times to be lower than 100ms, including the round trip to search servers. In this context a low latency is essential. Up to now we have been able to cover North America and Europe in less than 100ms (search computation included) but our latency with Asia was between 200ms and 300ms.

The first step of our on-boarding process is to select the datacenter where your search engine is hosted (we offer multi-datacenter distribution only for enterprise users). Interestingly, we discovered that we had no drop for European & US users but it became significant for others. It was a difficult choice for people outside of these two regions like in India or in Japan. It is actually also a difficult choice for anyone living between any two datacenters. So we also now display the latency from your browser and pre-select the closest datacenter, latency-wise.

To propose better latency and to reduce friction is the on-boarding process, it was clear that we have to add a datacenter in Asia. We chose Singapore for its central location. Unfortunately, the hosting market is very different in Asia and it’s much more expensive to rent servers. We sadly had to add a premium on plan prices when choosing this datacenter.

We are very happy to open this new datacenter in Asia with a latency that reach our quality standard. We are even happier to be able to help multinational websites and apps to provide a great search to all their users across Europe, North America & ASIA in less than 100ms with our multi-datacenter support (available for Enterprise).

 

Algolia is now part of AirPair’s premium support program

AirPairWhile we are proud to offer an easy to integrate API, there are still times when expertise can make a difference for the end users of our search customers. For such circumstances, we are excited to join AirPair’s premium support program! It will allow our customers to work with trusted Algolia experts on our API integration, engine configuration, and other topics linked with search.

AirPair is an extensive network of technical experts with a deep knowledge across many technology stacks and solutions (Hadoop, iOS, SAP integration, MongoDB sharding, etc). They accelerate software development by “pairing” experts with customers in real-­time via video and screen sharing – leading to better software, produced faster, and at lower costs.

To receive help and assist fellow developers, visit the Algolia Experts page on AirPair.

Introducing a non-intrusive way to onboard and activate: Connectors

Most of our users are technical guys. They love writing code and we love providing API clients in the major programming languages to them (we are currently supporting 10 platforms). They are doers. They love prototyping. Just like us, they work for startups which need to move fast, get things done, keeping in mind that done is better than perfect. Nevertheless, they don’t want to waste time. In this post, I will explain how one would use our API up to now, and how we introduced SQL and MongoDB connectors to ease integration and testing.

First steps with our API

Up until now, our onboarding process was to ask you to try the API by uploading your data. We give a lot of importance to our documentation and we make sure our users will not need more than a few minutes to integrate our REST API. Nevertheless, exporting your application’s data to a JSON or CSV file is often more complex than it appears, especially when you have millions of rows – and especially because developers are lazy :) no worries, that’s totally OK. It is something you may not be willing to do, especially just to try a service.

Initial import

90% of our users are using an SQL or MongoDB database. Exporting a table or a collection to a JSON file can be easy if you’re using a framework, for example using Ruby on Rails:

or more annoying, for example using PHP without any framework:

Anyway, in both cases it gets harder if you want to export millions of rows without consuming hundreds GB of RAM. So you will need to use our API clients:

Incremental updates

Once imported, you will need to go further and keep your DB and our indexes up-to-date. You can either:

  • Clear your index and re-import all your records hourly/daily with the previous methods:
    • non-intrusive,
    • not real-time,
    • not durable,
    • need to import your data to a temporary index + replace the original one atomically once imported if you want to keep your service running while re-importing

Or

  • Patch your application/website code to replicate every add/delete/update operations to our API:
    • real-time,
    • consistent & durable,
    • a little intrusive to some people, even though it is only a few lines of code (see our documentation)

Introducing connectors

Even if we recommend to modify your application code to replicate all add/delete/update operations from your DB to our API, this should not be the only option, especially to test Algolia. Users want to be convinced before modifying anything in their production-ready application/website. This is why we are really proud to release 2 open-source connectors: a non-intrusive and efficient way to synchronize your current SQL or MongoDB database with our servers.

SQL connector

 Github project: algolia/jdbc-java-connector (MIT license, we love pull-requests :))

The connector starts by enumerating the table and push all matching rows to our server. If you store the last modification date of a row in a field, you can use it in order to send all detected updates every 10 seconds. Every 5 minutes, the connector synchronizes your database with the index by adding the new rows and removing the deleted ones.

If you don’t have any  updated_at  field, you can use:

The full list of features is available on Github (remember, we ♥ feature and pull-requests)

MongoDB connector

 Github project: algolia/mongo-connector

This connector has been forked from 10gen-lab’s official connector and is based on MongoDB’s operation logs. This means you will need to start your mongod  server specifying a replica set. Basically, you need to start your server with:  mongod --replSet REPLICA_SET_IDENTIFIER. Once started, the connector will replicate each addition/deletion/update to our server, sending batch of operations every 10 seconds.

The full features list is available on Github (we ♥ feature and pull-requests).

Conclusion

Helping our users to onboard and try Algolia without writing a single line of code is not only a way to attract more non-technical users; it is also a way to save the time of our heavy-tech – and overbooked – users, allowing them to be convinced without wasting their time before really implementing it.

Those connectors are open-source and we will continue to improve them based on your feedback. Your feature requests are welcome!