Abdullah Diab’s Blog

Python Arabic Text Reshaper

I was trying today to generate PDF reports using Geraldo Reports and I needed to generate reports with Arabic text in them. Arabic is a very special script language with two essential features:

  1. It is written from right to left.
  2. The characters change shape according to their surrounding characters.

So when you try to print Arabic text in an application – or a library – that doesn’t support Arabic you’re pretty likely to end up with something that looks like this:

Arabic text, broken, left to right

Arabic text, broken, left to right

We have two problems here, first, the characters are in the isolated form, which means that every character is rendered regardless of its surroundings, and second is that the text is written from left to right.

To solve the latter issue all we have to do is to use the Unicode bidirectional algorithm, which is implemented purely in Python in python-bidi. If you use it you’ll end up with something that looks like this:

Arabic text, broken, right to left

Arabic text, broken, right to left

The only issue left to solve is to reshape those characters and replace them with their correct shapes according to their surroundings.

I solved this issue more than four years ago in a small application that I wrote in Visual Basic, my solution was naive but it solved it well, anyway, a few days ago I faced the same problem – rendering Arabic text correctly – but on Android, and I searched and used the solution in this SO answer, which is pretty similar to the solution provided in Better Arabic Reshaper.

Today I ported the solution in Better Arabic Reshaper from Java to Python, tweaked it a little bit, and used it to successfully render Arabic text in PDF, and the result was:

Arabic text, correctly shaped, right to left

Arabic text, correctly shaped, right to left

Pretty cool right? Here is another test with English text in it some diacritics:

Arabic text, with English text, correctly shaped, correctly directed, with diacritics

Arabic text, with English text, correctly shaped, correctly directed, with diacritics

It looks fine! in Word the same text looks like this:

Arabic text, with English text, correctly shaped, correctly directed, with diacritics, in Microsoft Word

Arabic text, with English text, correctly shaped, correctly directed, with diacritics, in Microsoft Word

Amazing, now it is time for you to use the ported library along with python-bidi to solve those issues.

Usage

import arabic_reshaper
from bidi.algorithm import get_display

#...
reshaped_text = arabic_reshaper.reshape(u'اللغة العربية رائعة')
bidi_text = get_display(reshaped_text)
pass_arabic_text_to_render(bidi_text)  # <-- This function does not really exist
#...

The pass_arabic_text_to_render function here is an imaginary function, it is just here to say that the variable bidi_text is the variable that you would need to use in your code afterwards, for example to print it in PDF, or to write it in an Image, etc.

Demo

You can try an online demo of this script on my page here: Arabic Reshaper Online.

Download

The source code is licensed under the MIT License.

You can install it into your Python installation with pip:

$ pip install arabic-reshaper

Project on GitHub

Source code download from GitHub

Have fun واستمتع! 🙂