C/C++ is still the only way to have high performance on Mobile

When it comes to programming languages and performance, you can read all and its opposite on the web. It’s definitely a very controversial topic!

[Edit 15-Nov-2012] I had questions on reddit about the data-structures and algorithms we used. We develop an embedded search engine for mobile, and tests were done on our own data-structure that is far more efficient than SQLite or other platform options for this use-case. [/Edit]

For Algolia the story started when researching an instant suggest algorithm. I used Java for two reasons:

  • Main reason: our first client was using Java on Google App Engine
  • Secondary: at that stage, I was doing a lots of refactoring and Eclipse is very efficient for these tasks

Once our algorithm was designed, I started to optimize performance on a desktop computer (core I7 950). For this, I indexed all titles of the English version of Wikipedia (4 millions titles). I optimized the Java code mainly by reducing the number of allocations. All instant suggest queries were then faster than 10ms.

As we planned from the begining to support iOS and Android, I needed to optimize for high performance on mobile. I then ported the Java code to Android and ended up with a few modifications (we needed to support old Android versions which have not a full support of Java SDK).

In order to test performance, I created a small “demo” app: CitiesSuggest, a mobile application based on Geonames database to look for a city name. I filled the index with all places with a known population. In the end, the database contained 270 000 city names.

As could be expected, the resulting application was quite sluggish on my Galaxy S2. The worst queries could take more than one second of CPU.

I then applied all possible methods described in the Anroid documentation and was able to reduce the response time to 700ms for the longest queries. This is better but still gives end-users an impression of sluggishness! Fortunately, common queries worked well enough to present our very first demos at LeWeb’12 London. I was subsequently able to improve the user experience a lot by adding asynchronous support. At that point, we decided to start implementing the objective-C version for iOS.

I started a new implementation fully done in objective-C from scratch without any premature optimization. I then developed the same CitiesSuggest application for iOS. I first got stuck with some basic UI stuff. For example I needed to reimplement an UILabel that supports highlighting! The standard UILabel does no support this, and other implementations like Three20 just have too many dependencies (you can download AlgoliaUILabel on github, I released  the code under Apache licence). Once the UI was ready, I could see the actual performance was catastrophic! Instant suggest queries were between 200ms and 10 seconds on an iPhone 4S!

After profilling, I discovered that 95% of the time was spent in Objective-C messaging and ARC. Actually, I have millions of calls to very small methods with 1 or 2 lines of code, and I found a very good explanation in the internal of objc_msgSend (mainly on mikeask.com). There is in fact a hash table lookup behind objc_msgSend! This explains most of my performance problems.

To fix these problems, I started to replace all this low level objective-C implementation by a C++ implementation with inlined methods. Eventually, I finished with Objective-C code for high level classes while all the core was C++.

This change has dramatically improved performance with instant-search queries between 2ms and 90ms on a iPhone 4S. I struggled to find complex enough queries to reach 90ms! This resulted in a very nice user experience with a remarkably slick demo 🙂

After this succes, I decided to use the same C++ code on Android with Android NDK. With this approach I reduced our maximum query time from 700ms to 80ms on a Galaxy S2. But I was really disappointed by the Android display, as I did not reach the same level of interactive experience that I had with iOS. The display of results stays slower on Android, probably because I did not spend enough time to optimize this part.

In the end, I lost a lot of time with Java and Objective-C trying to optimize code when the real solution was to use C/C++. I am fully convinced that it is just not possible to reach the same speed with Objective-C only or Java only.

And there is also another good property with C++ code: Blackberry supports C++, and there is large chance that Windows Phone 8 SDK will support C++ when released in a few weeks. Actually, I do not see any other alternative than C/C++ when you are looking for performance on mobile devices, which is our case 🙂

I hope this article will prevent you from spending as much time on Java/Objective-C optimizations as I did.

  • This blog post is on reddit !

  • Ciprian Mustiata

    Sorry, but did you try the Mono for Android? I mean performance is obviously relatively comparable to C++. I agree that Android’s lower end optimizing JIT is way slower than a LLVM based compilation backend. The single downside of Mono for Android/MonoTouch is that the IDE + platform costs money, but as for me performance was not an issue. And if you think about Windows Phone 8 SDK, they are smart enough to use a static compiler too (using the cloud to compile) so it should offer a good performance too.

    • No we haven’t tried Mono for Android. The big advantage of C/C++ is that your optimization is working for Android, iOS and Windows Phone 8.

      In fact we have also done a lot of optimizations that are not possible in C# (code inlining, reduction of page faults on a mmaped file, ….).

      • Ciprian Mustiata

        Code inlining is done in may ways by the backend optimizer, isn’t so? And as Android from SDK 4.0 uses LLVM, as Mono for Android, if they match good patterns they both are inlined. And don’t take my word for it, Microsoft blog announced that both Mono and .Net offers it: http://blogs.microsoft.co.il/blogs/sasha/archive/2012/01/20/aggressive-inlining-in-the-clr-4-5-jit.aspx I fully agree that are optimizations that are C++ only, as I think there are optimizations that work smooth only only (like creating of small objects in a fairly cheap fashion, if for example you will have a parsing). Also, JIT on demand can create some nice code cases (as Mono for Android is statically compiled, but new code is JIT-ed) so you can supposedly create an expression to evaluate a record in your DB as working, and at the end JIT it (using for example C# hosted compiler) so you will have a “native” evaluator. What I mean, that the title “… C++ the only way” is it true to some extent, but still it matter a lot on your application and coding style.

    • dD

      Oh yeah, “relative comparable to C++” on what scale factor? Raw number crunching C# mono is at its best twice as slow as C++, 5-6 times slower on average including up to 10+ times slower at specific workloads. On top of what you have to pay for it. Plus it is far from Qt in the graphics area.

  • Hi Julien,

    C is just one level above assembly language (so it’s normal for it to be much faster) – although I have to wonder how it works within Android.

    Is there a guide that explains how to use C/C++ in Android? It would be very interesting to try something like this to see myself.

  • Rigoberto

    Was it possible to avoid the use of the hash table while still use objective C?

    • Yes there is one but it is painful compare to using C++, and you have no inlining:

      SEL theSelector = @selector(methodName:);
      IMP theImplementation = [self methodForSelector:theSelector];
      theImplementation(self, theSelector);

  • I agree that for really low-level performance-sensitive code, like your search engine, you’ll want to go down to the metal (which today means C or carefully-limited C++; I remember the days when C was derided and you had to go down to assembly for “real” work.)

    But most people aren’t writing code like that, so your experience — while valid for your project — is of limited use in general. In most application code, performance bottlenecks come from poor choice of algorithms, overuse of memory, or lack of concurrency … not from raw CPU limitations. I’ve been writing Objective-C code for 13 years and it’s quite rare in my experience for message-sends to be more than a blip in profile results. (And where they are, you can rewrite that piece of code in C/C++ without having to abandon Obj-C for the rest.)

    It sounds from your description (“without any premature optimization”, “millions of calls to very small methods with 1 or 2 lines of code”) as though you’re very used to C++ and inline methods, or at least an inlining JIT. As with any other language, it takes time to learn the right way to write Objective-C code to make it work well, especially on a mobile platform.