I was trying today to generate PDF reports using Geraldo Reports and I needed to generate reports with Arabic text in them. Arabic is a very special script language with two essential features:
- It is written from right to left.
- The characters change shape according to their surrounding characters.
So when you try to print Arabic text in an application – or a library – that doesn’t support Arabic you’re pretty likely to end up with something that looks like this:
We have two problems here, first, the characters are in the isolated form, which means that every character is rendered regardless of its surroundings, and second is that the text is written from left to right.
To solve the latter issue all we have to do is to use the Unicode bidirectional algorithm, which is implemented purely in Python in python-bidi. If you use it you’ll end up with something that looks like this:
The only issue left to solve is to reshape those characters and replace them with their correct shapes according to their surroundings.
I solved this issue more than four years ago in a small application that I wrote in Visual Basic, my solution was naive but it solved it well, anyway, a few days ago I faced the same problem – rendering Arabic text correctly – but on Android, and I searched and used the solution in this SO answer, which is pretty similar to the solution provided in Better Arabic Reshaper.
Today I ported the solution in Better Arabic Reshaper from Java to Python, tweaked it a little bit, and used it to successfully render Arabic text in PDF, and the result was:
Pretty cool right? Here is another test with English text in it some diacritics:
It looks fine! in Word the same text looks like this:
Amazing, now it is time for you to use the ported library along with python-bidi to solve those issues.
from bidi.algorithm import get_display
reshaped_text = arabic_reshaper.reshape(u'اللغة العربية رائعة')
bidi_text = get_display(reshaped_text)
pass_arabic_text_to_render(bidi_text) # <-- This function does not really exist
The pass_arabic_text_to_render function here is an imaginary function, it is just here to say that the variable bidi_text is the variable that you would need to use in your code afterwards, for example to print it in PDF, or to write it in an Image, etc.
You can try an online demo of this script on my Python/Django site here: Arabic Reshaper Online.
The source code is licensed under the GNU Public License (GPL).
Have fun واستمتع! 🙂