< Back

API Documentation 2.0: From Browsing to Search

A challenge that documentation writers face is how to encourage developer readers to stay in the documentation and not reach out to support until absolutely necessary. To meet this challenge, writers of course focus on the text  its clarity, coherence, and exhaustivity. They also focus on form, devising ways to guide developers to the relevant text structuring the information, highlighting key words, hyperlinking, creating menus and submenus, thinking up cool titles. Algolia adds search to that effort.

Algolia’s new integrated search panel

From browse to search

Developers come to documentation seeking answers, hoping to find what they are looking for. They browse, read, and browse a bit more. Often they find what they need, but not always. Some will eventually contact support for more personalized guidance, which may send them back to the documentation, but this time to the exact paragraph or code sample they were looking for.

Algolia’s documentation is about search — to wit: how to use our search API. So, we thought: if our users can’t always find what they need using our own search bar — and worse, to learn later that what they were looking for was actually present in the documentation— what sort of message were we sending about our API?

So we’ve faced this challenge head-on with an example of Algolia’s powerful search engine — by expanding our current search bar into a fully-integrated search panel.

documentation search panel

The new search panel is designed to be prominent and conversational, so that whenever our developers ask themselves What is or How to or Why, they simply type the remaining part of their question in the search bar, and our new panel lights up with the answers they are looking for.

Adopting best practices

Overall, our UI/UX model was Google. We adopted a Google-like feel, but used our own search engine + the knowledge of our own documentation to drive the whole user journey from question(s) to answer(s).

We also believe that search is a bridge between customer support and documentation. That’s why we included support discussions from our public forum in the new search panel. Now, when you search our documentation you’ll also be looking into our support history. That way, you get a side-by-side view of all relevant information about our API — relevant texts, code snippets, and all support tickets.

This time our model was Google + Stack Overflow, that well-known dynamic duo that has saved every developer from the great unknown. Stack Overflow, and more generally community-driven support, have become essential to the developer experience. By integrating our own developer community into our documentation, we will be giving our developers that same standard — and maybe even more, given that we know our own support data and can therefore fine-tune the search results.

Finally, taking this Google/Stack Overflow model a bit further, we decided to display code samples in the search results. Many developers come to our docs with a very specific question in mind; for them, finding a well-written line of code is often the best, most direct answer. So we added a toggle button to switch between text and code, allowing developers to search only for code.

With these features in place — a prominent search panel, integrated support, and code searching — we hope to extend the trust with our readers, so that they keep coming to our documentation expecting a useful experience.

We are also backing up our efforts with analytics: real metrics that will help us follow developers from query to query, page to page, and even from support to documentation. That kind of feedback loop will tell us how we can shorten the reading process and make it more pleasant, and it can also indicate how we can encourage our doc readers to use more advanced features, to push our API to its limits, which benefits everybody.

And we won’t stop at analytics. Because the challenges — to write clear, coherent, exhaustive, and easy-to-find information — will never go away, we will need to keep improving by focusing on different kinds of search strategies that work particularly well for natural language documentation.

Search in more detail

…or more specifically — what strategies did we use to ensure that our readers find what they are looking for?

In a nutshell: a successful document-based search relies in large part on how you organize your content. Global sections, individual pages, headers / sub-headers, and paragraphs — these are only some of the textual elements that, when done consistently and ordered logically, matter a lot. In our case, with a well-thought and cohesive collection of texts, Algolia’s speed and relevance work out of the box.

Another focus is on the query itself. The search engine can, for example, behave differently depending on the specificity of the query: for simple, general queries (like “index” or “install”), the content can stay high-level. For longer or more precise queries (like method names, or longer sentences), we can switch the results into more low-level API references and concepts.

Searching structured data

Let’s look at what Algolia does best — searching structured data. Here is an example of a T-shirt inventory. If the user is looking for a “slim red T-shirt with nothing on it”, you can help them find it by filtering:

Type: T-shirt
Color: red
Design: blank
Type: slim

If the user types in “T-shirt”, they get the whole inventory (1M records). If they add “red”, you divide the 1M t-shirts by 5 (let’s say there are 5 colors). If you add “slim”, you divide by 3 (there are 3 types: slim, wide, and stretch). If you start adding other criteria – like “midriff”, “sleeveless”, “multi-colored”, and so on, you could conceivably reduce the relevant stock to 25 t-shirts. Not bad, from 1M to 25! And a good UI would make this process as easy as possible for the user.

All this works as described when the content in which you are looking contains very clearly defined items. The discrete clarity of commercial products is what lies behind the success of structured data searches.

But not everything is so discrete. English language has an unlimited number of ambiguities, so creating categories for natural language is not a scaleable solution.

Unstructured text — from search to research?

Let’s now take a look at two queries which make for a difficult search structuring as described above.

Example 1— legal text

Let’s switch subjects to better illustrate the point. Let’s say a lawyer types “out of order” in a legal database that contains court cases, laws, and legal journals. For this query, there are at least 4 relevant categories of documents, with each category containing 1000s of documents:

  • A database can be “out of order”, causing system failure (computer law)
  • An elevator is “out of order”, causing monetary loss (commercial law) or personal injury or death (torts law)
  • A lawyer is “out of order” in a courtroom (procedural law)
  • A factory produces widgets “out of order”, breaking contractual terms (contract law)

The lawyer clearly needs to signal to the search engine which of these categories is relevant.

Example 2 — API documentation

It would be the same if a developer were to come to Algolia’s documentation and search for “indexing filters” and find two categories of documents: :

  • the back end (creating filters in your data)
  • the front end (filtering at query time)

and four formats:

  • concepts, tutorials, code snippets, API reference

The developer will want to have control over both the subject and format of the documents retrieved. I’ll use the term “researchers” for our confused lawyers and developers above.

As-you-think

Let’s go back to the T-shirt example to see if that can help here. That example was about one item: the consumer is searching for one thing, and the quicker they find it the better.

The other extreme are researchers: researchers are often not looking for one thing. Their query is to think about a subject, to get a better understanding and to construct and support an argument. They have to be patient. If they are searching a site with 1M documents, they are ready to scan through 1000s of results (in say one or two hours, or days, or longer), and to read 100s of documents. We are clearly not talking about consumers.

Developers fall somewhere between these extremes. Sometimes they know more or less what to look for and so are searching for one thing — for example, a method or a specific setting. Other times they don’t really know what they are looking for: they might be onboarding, or trying to solve a difficult problem, or looking to enhance their current solution. In this case, they are more researcher than searcher.

But even here, we don’t want to waste a researcher’s time with irrelevant results. And we surely don’t want them to fail by not presenting them with important results (this is the difficult balance of precision and recall).

Essentially, we want researchers to have the same quality results — and the same confidence in Algolia — that our consumer clients have. 

And so the challenge is clear. How do we structure our “unstructured” documentation to come up with consumer-grade results?

Search strategies for documents

Behind our new search panel — what we’ve done

Algolia’s power lies in structuring the data before searching. To put this into action, we focused on four key areas:

  • Organizing content — the way that our documentation is organized (within the page as well as the combination of all the pages) is probably the most important step
  • Indexing — structuring the text for search and relevance
  • Relevance — testing out numerous queries and engine configurations, to ensure a consistently good relevance
  • UI/UX — the developer experience: how to encourage our readers to use and to keep using our integrated search panel. (Although this is of equal importance, we do not describe how to implement the InstantSearch UI)

Our indexing and relevance strategies follow our DocSearch methodology, which has been well documented by our CTO in a previous blog post on The Laravel Example. There he describes:

  • How to structure your texts with sections, headers / subheaders, and small paragraphs
  • How to order the overall content so that that some information is more important than other
  • How to tweak our tie-breaking algorithm using customer ranking (again, relevance)
  • How to configure Algolia with filters, facets, synonyms, and many other settings.

A recent feature not mentioned in the post is our extensive use of Query Rules to handle special use cases like specific products or coding language queries.

There is, of course, the matter of documentation tools. We have written about that in a separate post.

Exploring what’s next

Searching through thousands of documents is not an exact science. There are many pitfalls, and though we’ve solved many of them, it’s hard not to wonder: what happens when there are 1,000,000+ documents? Here are some interesting features not yet implemented.

A word cloud filter

Algolia offers complete filtering out of the box, but we rely on our users to define the most effective filters for their information. One way to do that is to use word clouds. Word clouds, in this context, are a set of filters that act as one global filter. For document-intensive searching, word clouds can be quite powerful.

For example, we can help resolve the above lawyer-researcher’s “out of order” ambiguity by using word-cloud filtering:

word cloud filtering

As you can see, the four word clouds above match the four distinct areas of law mentioned in the “out of order” example. Normally, a filter is one word: by presenting a set of related keywords within a single frame/word cloud, we offer the user more information to help choose the best filter. And by making these word clouds clickable (as seen below), the user can think-by-clicking, to test which set of words most closely matches his or her train of thought.

There are many ways to build word clouds, one of which is to scan each document using specialized dictionaries, to pick out keywords that make the document unique — and to do this before indexing them. For the example above, you would use different specialized legal dictionaries. For our API docs, we would use our own API reference pages as dictionaries for each new document added to our documentation.

Thematic frequency

Some documents are so similar in terms of relevance that it is impossible to know which should be presented first. At this point, the engine needs help. With structured data, such as shopping items, this is achieved through custom ranking, using metrics like “most popular” or “most sold”. However, using metrics is not always relevant for texts. For example, we can use “most cited” or “most read”, but these metrics are often irrelevant to a researcher.

So, why not create front end tools that help researchers — documentation readers themselves — choose between different ways to break ties?

Below is one such tool, which implements thematic frequency a shortcut term to refer to the classification of documents by theme. Each document can be placed in one or more themes based on how close (or far) its content is from the theme. The themes are represented by word clouds. Documents can be scored using the thematic word clouds by matching the document’s content with the keywords contained in the word clouds. Later, filtering can use that scoring to both find and order the documents.

For example, here’s a subset of results for the theme “server-side indexing and filtering”, in the order returned by Algolia:

a subset of results for the theme “server-side indexing and filtering”

The UI can offer the ability to switch between rankings:

  • Keep the ranking returned by Algolia’s tie-breaking algorithm.
  • Adjust the ranking with Algolia’s custom ranking feature, ordering the list by “most read” or “most cited”.
  • Add thematic frequency on top of Algolia’s ranking algorithm, reordering the results according to the strength of the document’s relationship to the active theme.

By choosing the last option – thematic frequency – the researcher could reorder the results from (1, 5, 20) to (20, 1, 5), because record 20 contains the largest number of thematic keywords. In other words, document 20 goes to the top because it is more consistent with the theme of “server-side indexing and filtering” than documents 1 and 5.

These bonus strategies, as well as many others, will keep us – and hopefully our readers – confidently within the conversational search powered by Algolia.

We look forward to your feedback on the effort we’ve put in so far, and on future ideas: @algolia, @codeharmonics.