Your browser (Internet Explorer 7 or lower) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.

X

Why develop our own Unicode Library?

Unicode Logo from Wikipedia
Unicode Logo from Wikipedia

At a time or another, most developers come across bugs or problems with Unicode (about 3,720,000 results on google for the request unicode bug developer at the time of this writing). Let me tell you about my experience in the last decade and why we have now implemented our own unicode Library to produce exactly the same result across devices/languages.

I first started to use Unicode in 2004 when I was developing a Text Mining software specialized on information extraction. This software was fully implemented in C++ and I used IBM ICU library to be Unicode compliant (all strings were stored in UTF16). I also used some normalization functions of ICU based on decomposition but I did not notice any major problem at that time. I started to understand the dark side of Unicode later when I used it in other languages like in Java, Python and later in Objective-C. My first surprise was when I understood that a simple isAlpha(unicodechar c) method can return different results!

Continue reading…