A Blog About Programming, Search & User Experience

Why Android APK Format is a Mistake – The Algolia Blog

Design flaw in the Android APK format.When I started to develop for Android it appeared to me that an APK file was just an archive, a simple approach that you can find in many systems today. Files are extracted from the archive at installation and you can access them via the file-system.

This seemed even more reasonable since Android uses Linux which is very good in respect to POSIX standards.

But I was completely wrong! An APK is not a mere archive: the application starts from and uses the APK at runtime! This is a horrible decision that will probably hurt Android for a long time…

[Edit 28-Jan-2013] The goal of this post was to express my point of view about the bad properties of using directly the APK file at runtime versus relying on the file system. I used memory-mapped file to illustrate this but the post is incorrect on that topic. There is in fact a way to memory-map a file directly from the APK: you can use an extension for which files are stored uncompressed inside the APK (mp3, jpg, …) and use the AssetManager.openFD() or Resources.openRawResourceFD() to have offset/length inside the APK file.

All my thanks to Jay Freeman for his excellent feedback. His comments helped me to understand my mistake and to improve our Android integration!
[/Edit]

What is the Problem with the APK format?

Let’s look at our own example. At Algolia, we have designed an efficient binary data-structure that is able to answer instant-search queries in just a few milliseconds, even with a very big data set (for example with all the world cities’ names from Geonames: 3M+ entries). This data-structure is designed to be used directly on disk without being copied in memory. To obtain optimal performance, we use a memory-mapped file which is standard on all platforms, especially on Linux.

We have been able to use memory-mapped files on all platforms, except on Android!  In fact you can only retrieve an InputStream from a file packaged in an APK. So the only solution to use a memory-mapped file is to copy the resource from the APK on disk and then to use the file-system. This seems like re-implementing an installer in each application.

Is the APK so bad? Why did they design it this way?

I imagine that Android developers chose this approach to solve some pitfalls of file-systems. I can think for example about solving performance problems when you have a lot of small files in one folder, or reducing the size of applications on the device (resources are compressed in the APK and decompressed only when the application uses them, which actually contributes to the sluggish image of Android).

I may of course be wrong, there may be other more important reasons for this approach. But if not, Android should have thought more about the consequences of their choice: in the long term, the APK constraints are more serious than those small pitfalls that could have been solved in other ways.

But wait… Android applications can contain dynamic libraries (.so files) via NDK. Isn’t it the principle of dynamic libraries to be memory-mapped? In fact I am pretty sure they discovered this problem when working on NDK since dynamic libraries are automatically extracted from APK file at installation and stored in an application directory in ‘/data/data’. I am wondering why they decided to implement this hack instead of fixing the problem…

Conclusion

Developing an API, a SDK or worse, a whole platform, is extremely difficult. Let’s face it, it’s unavoidable to ship some badly designed components or inconsistent APIs. We definitely need to listen to developers’ feedback even when it hurts. Actually, the real difficulty comes when it’s time to put things right without alienating existing users!

By the way, if you know more about APK design choices, I’m interested to hear from you!

  • Kate

    Is it possible to copy the file to SD/main memory on first run?

    • http://www.algolia.com/ Julien Lemoine

      Yes this possible and is the only way I know to workaround APK limitation.
      The consequence is that your first start can be long and you will need to have a waiting screen to give feedback to users.

  • Alberto Torres

    Correct me if I’m wrong, but I’m pretty sure you can have a file in the APK without zip compression, and you can natively read the APK itself! The reason the .so files are extracted is because they’re compressed by the SDK, and therefore they can’t be memory-mapped.

    • http://www.algolia.com/ Julien Lemoine

      Yes you can have uncompressed files in your APK (for example if your file is named .jpeg, .gif, .mpg, .mp3, …) but this does not mean you can memory-map this file. The only solution I can imagine would be to memory-map the whole APK file and try to interpret APK to find the part of APK that contains the given file, but it seems very dangerous.

      • http://www.saurik.com/ Jay Freeman (saurik)

        Instead, you use AssetManager’s openFd, which will return a file descriptor and an offset into that file where you can find the asset data. You do not need to “try to interpret APK to find the part…”, that is simply given to you. This is a common use case with audio files.

        • http://www.algolia.com/ Julien Lemoine

          You are right, this is the best workaround possible. This is still a workaround since it need to memory map the whole APK file.

          In our case this is not really acceptable since we develop a SDK that is embedded by developers in their app and this would put constraints on their app if they want to embed an index.

          • http://www.saurik.com/ Jay Freeman (saurik)

            I do not understand why this would possibly be a concern.

            You do realize that there is no cost to memory mapping a very large file, right? You still are going to end up with only a single VMA (“virtual memory area”, one of the more insidious costs of memory maps) in your memory map, and only pages you use will get paged in; if you wanted to memory map a 1GB large file, the only pain would be the loss of address space, and you have 2-3GB of that available.

            Additionally, you aren’t even correct that you have to memory map the entire file: as you have a file descriptor for the file and know the offset you want into the file, you can memory map just the part of the file you need. Both the C-level mmap API and the Java-level FileChannel.map API allow you to provide a specific offset and length for the region you want to map from the file.

          • http://www.algolia.com/ Julien Lemoine

            From a pure technical point of view, you are right.
            But imagine you are an non expert client of our lib, just asking you to package an index as a mp3, jpeg (or whatever extension that is not compressed) file and pass the whole context to do all the mmap magic behind him seems crazy, no?

          • http://www.saurik.com/ Jay Freeman (saurik)

            Your arguments have all been technical arguments, though: you started with a complaint that the file format was fundamentally inefficient for your use case as it isn’t possible to memory map out of the file, and then switched to this weird complaint that you’d have to memory map “the whole file”, as that would somehow “constrain” the app.

            The fact is that there is simply not a technical issue here: what you are trying to do is fully supported and will cause no loss of efficiency. Android really does assume that people (including their UI frameworks) will want to memory map files directly out of the APK distribution format, and has gone out of their way (zipalign) to make this quite easy.

            Now you seem to be arguing that the user of your library will care how your library works: no, as a user of various libraries, I really don’t care how much “magic” they have to do behind the scenes; especially in this case, as you already claim your technical advantage includes understanding how mmap works (ironic, huh? ;P).

            Yes: I will have to give you a reference to my Context; however, that’s really how Android APIs usual works: almost every API of merit–including the entire UI system–is based on being given a Context and a set of resource identifiers. This sounds like a great way for your library to work as well: I give you a Context, and the resource ID of the index.

            What this leaves is the idea that the user will be horribly confused by having to rename the index they generate with some unrelated file extension. FWIW, I agree: but that’s not a limitation of the APK file format, and it certainly doesn’t mean that APK itself “is a terrible mistake”; instead, it is a problem with the interaction of aapt and ADT.

            So, instead of this massive public rant about how the APK file format is fundamentally inefficient, how it was “a terrible mistake”, how the developers who designed it “should have thought more about the consequences of their choice”, you should really just be ranting about how ADT doesn’t make it easy to pass arguments to aapt.

            (Or, you know, not ranting at all: I maintain that this would have made a great question on StackOverflow. ;P)

          • http://www.algolia.com/ Julien Lemoine

            First of all, thank you for your comments, you took time to reply and give technical arguments. I am very sorry that I did not took this time to write a correct answer to your feedback before this one.

            As I explained I found this issue when working on the packaging of an binary file that must be memory-mapped within an Android App. I needed a solution that works with all Android versions and that can be used by any developers.

            To be fully transparent, I was focused on compressed resources because of the 1MB limit on uncompressed file in Android = 2.3 (no acceptable in our case)

            I also agree with you that you can just fix APK (everything is possible with code…). One limit have already been fixed (1MB limit of uncompressed file) and I am sure they will fix the other issue related to the file extension.

            The goal of this post was to express my point of view, I am still convinced that it would have been better to directly rely on file system, a least it would have avoid these issues.

            Btw I will add an edit at the beginning of the post, and once again thank a lot for your comments.

          • http://www.saurik.com/ Jay Freeman (saurik)

            First, assuming this were actually true, it isn’t something wrong with the “APK format”; like, you say “you can just fix APK”, but the APK format isn’t the problem here, it is the thing that loads it: based on the concept of what an APK is supposed to be, you were able to construct that APK just fine… there are no changes to APK required for you to have that large file.

            The filename extension thing is definitely irritating, but it is a problem with aapt, not the APK format. You could again see an article talking about how it was an oversight to not allow developers to pass command-line arguments through to aapt from their Eclipse project, but that is certainly not an APK issue: the APK file format itself, and even the aapt tool, happily support passing random file extensions for purposes of “this is not compressed”.

            What I would do in this situation (having to tell developers to use weird filenames to make their projects easy to do in Eclipse) is file this issue (along with the trivia patch) against the ADT (again: just Eclipse is an issue here), and then write a technical blog post about how not only was this suboptimal but how you are the reason it got fixed. In the documentation, you could explain that a workaround until these patches hit (or for older SDK builds) is to use a silly extension.

            I also will happily agree that there are benefits to simply using the filesystem APIs, which would have allowed for the reuse of more C code from other systems that already is developed against things like “open” and “mmap”. However, this is a high-level API design tradeoff for which there is then an interesting discussion to be had about “is it worth it vs. what we lose elsewhere”.

            Regardless, this brings me to the yet-another technical constraint you are stating keeps you from using this feature, and this one still isn’t correct :(. The UNCOMPRESS_DATA_MAX limit from Android versions before 2.3 actually applies to the in-memory uncompressed buffer used for /compressed/ files. If you are getting that error, then your file is not being stored uncompressed: it is being stored compressed. If Android gave that error on files such as MP3 and PNG, it would have been unshippable, as those are /often/ larger than 1MB.

            Therefore, this continues to simply be “we didn’t pay enough attention to what the actual constraints here are”. On all versions of Android, you can ship an uncompressed file of arbitrary size in an APK, and that file will be capable of being easily mmap’d using high-level APIs provided by Google. Sadly, this does mean that if you use aapt to generate your APK, and you are running aapt from Eclipse, you will have to give the file a silly extension, as there is nothing in the Eclipse ADT UI that allows you to pass the “-0idx” argument to aapt that would let you use the .idx file extension for an uncompressed file. If you don’t believe me, I’ve provided the evidence from the Android source code tree below.

            But… that left the question “how are your current instructions working at all”: you claim that you are telling people to put the file in res/raw so it ends up in your APK as a compressed asset, and that thereby avoids that size limitation (but at the cost of requiring the file to be copied to a temporary file in order to be memory mapped). But, that should not actually help, as this error actually only happens if you use compressed files: storing the file compressed should actually cause this problem. I was thinking of going through your instructions, but apparently rather than shipping a simple Java command-line app that generates an index I’d have to install a CPU patch, setup the Android emulator, and do a bunch of work in Eclipse to get a static index (none of which I’d do even if I wanted to use your library: it would be critical for me to have fully automatable build processes ;P).

            However, what I could do, is just use the exact sample code you specified for how to extract the asset (with .openRawResource), and to just place a dummy file as you specified into res/raw. I did this on Android 2.1, with an asset that was larger than 1MB; and, as expected, it didn’t work if I saved it compressed by naming it “geonames.idx”: I got an IOException when I tried to read from the InputStream, and I got the “Data exceeds UNCOMPRESS_DATA_MAX” error. As also is now expected, renaming the file to “geonames.mp3″ solved the problem. Uncompressed files really do work, at which point you can either stream them or memory map them; if you saved them compressed, however, in addition to getting the performance issue, it also will not work on Android 2.2 and below.

            Your instructions are thereby attempting to work around a bug in reverse and are actually /causing/ the bug you are trying to avoid. As in, you are now claiming that there is a technical failing of APK that you needed to work around, but in fact it is your workaround that causes the problem and the instructions you put on your website don’t actually work. You really just need to have your users store the assets in the file uncompressed (possibly submitting a patch to ADT to help them not use silly file extenions), and then use the openFd to let your library memory map the file. This really isn’t a problem with Android, but it apparently is still a bug in your library. ;P

            ========================

            Here is the actual code that was removed, and as you can see it is in _CompressedAsset::getBuffer, as the old code used to allocate a single massive memory block in which to uncompress the contents of the compressed asset. The change they made in Android 2.3-ish was to allow these compressed assets to be streamed as they are decompressed.

            @@ -817,12 +833,6 @@ const void* _CompressedAsset::getBuffer(bool wordAligned)
            if (mBuf != NULL)
            return mBuf;
            - if (mUncompressedLen > UNCOMPRESS_DATA_MAX) {
            - LOGD(“Data exceeds UNCOMPRESS_DATA_MAX (%ld vs %d)n”,
            - (long) mUncompressedLen, UNCOMPRESS_DATA_MAX);
            - goto bail;
            - }
            -
            /*
            * Allocate a buffer and read the file into it.
            */

            Here is the commit message that removed this error; again, you can see that this change has to do with compressed assets, not uncompressed assets. If you still don’t believe me, I can go back to a 2.2 device and scrounge around until I find an actual shipped uncompressed asset that is larger than this cutoff (which I’m certain I will be able to, due to wallpapers and fonts).

            > Support streaming of compressed assets > 1 megabyte
            > Compressed assets larger than one megabyte are now decompressed on demand rather than being decompressed in their entirety and held in memory. Reading the data in order is relatively efficient, as is seeking forward in the stream. Seeking backwards is supported, but requires reprocessing the compressed data from the beginning, so is very inefficient.
            > In addition, the size limit on compressed assets has been eliminated.

            Here is a response to someone with this issue on the Android-Developers mailing list, where we see the workaround proposed is to store the asset uncompressed and memory map it, instead of relying on the compressed asset storage mechanism. This is also the workaround that people propose on sites like StackOverflow for this particular error message.

            > This is baked into the Android framework. It’s not something you can change unless you’re building your own platform.
            > If the data is uncompressed and aligned (with zipalign) it can be memory-mapped directly, which is easier on the system than dedicating a large piece of physical memory to hold the entire uncompressed file.

            https://groups.google.com/forum/?fromgroups#!topic/android-developers/YnHXirqT1-k

          • http://www.algolia.com/ Julien Lemoine

            Wow, I definitively missed a lot of important information when I worked on mmap on Android, thanks a lot to have corrected my mystakes.

            I will fix our lib and change the edit at the beginning of the post.

            Btw if you have some time to look at our product, your feedbacks will be more than welcome, you have already contributed to improve it with these comments.

    • http://www.saurik.com/ Jay Freeman (saurik)

      The .so files have to be extracted not because they cannot be memory mapped (I mean, if nothing else, the SDK could have been modified trivially): it is because the standard system executable loader is used, which is not capable of interpreting embedded files.

      • http://www.algolia.com/ Julien Lemoine

        I agree, I mixed two topics: the fact to have a file on file system and the mapping of this file in memory. Having a file on disk is a strong requirement to reuse unix dynamic libs loading (via LD_LIBRARY_PATH). Then the memory mapping is also required to reuse the loader as it.

        At the end the APK constraints are not the best to reuse existing tools/code that was build on file systems abstraction and POSIX properties.

        Btw I think an evolution of the APK format could be to have 2 parts:

        – a part what contains compressed data and stay on disk to launch the app
        – a part that is only used only for installation: extract uncompressed files and store them on file system.

        • http://www.saurik.com/ Jay Freeman (saurik)

          This is a fair point (and, in truth, this is my primary complaint about the way asset management is handled on Android: that it makes it difficult to reuse existing Unix tools, as the entire concept is incompatible with the standard filesystem APIs); however, for your specific use case (you need to memory map a single file out of the APK) they have you covered quite well.

          On the other side, the way Android is currently setup allows you to have efficient access to large assets while also being able to store the exact original package, signed and verified by the developer, with no duplication of storage. This is a compelling benefit that is worth at least some cost, although whether it is the right tradeoff would make an interesting conversation.

          It is not, however, at all clear that it is “a terrible mistake” or that these “constraints are more serious than those small pitfalls”; and it is simply not true that this “actually contributes to the sluggish image of Android”, as almost all resources where this would matter are actually stored uncompressed and are already being memory mapped directly from the APK.

          • http://www.algolia.com/ Julien Lemoine

            Yes I agree, my approach of this post was not good. I took one technical example instead of talking about the real topic : Files packaged in a signed archive and extracted on file system versus a signed package used without extraction (APK).

            Btw, for me the memory-mapping of file is not “covered quite well”, see my other detailed reply, there is too constraints to be well covered.

            Another think you probably noticed: the title is not “terrible mistake” but “mistake”, I just forgot to removed terrible from the URL (terrible mistake was my first title but I did not found it honest).