Translating multilingual websites the easy way


Note: This is a post written on August 9, 2009 in my personal website, Bibakis.com.
It has moved here, mostly for archival.

Pieter Bruegel the Elder - The Tower of Babel

One of the biggest challenges in web development is making multilingual websites. The easy part is having common functionality across all available versions. The hard part is making sure you don’t have blank spots. And that means that if you got an English-French website, you shouldn’t have French words appearing in the English version and vice versa.

The oldest solution I can remember used in PHP is the use of constants. For example instead of writing Home page in your HTML, you write __HOME_PAGE__. Then you got a file for every language with constants that you include in every “visible” PHP script. An if block takes care of including the proper constants file for the currently active language. The major problem of this technique is that if something is left untranslated you get ugly __CONSTANTS__ all over the place. There are few things that can make a website look more amateur.

In CodeIgniter there is the language class that takes care of this issue. Instead of using constants you simply do $this->lang->line(‘Home page’) to get the translation for the words Home page. Then you got files in the “application/language” folder with all the translatable items.The problem with this approach is that when something is left untranslated you don’t know how to find it easily.

I recently took over a multilingual e-shop platform project for an agency. A friend and colleague there came up with a solution for this which is to extend the language class and add the ability to the line method to add marks before and after a string in case there is no translation available. For example Home page becomes !– Home page –!.

Although this time we are more organized and flexible since we can for example enable/disable/customize the !– –! marks you will notice that practically we are still in the footprints of the ugly constants trick. We still get !– Ugly –! stuff in our website in case a translation is forgotten. The worst part is that we still have to scan the entire site by hand and hunt down every little !– untranslated word –!.

To overcome all of the above problems there are two simple things we can do. The first one is to add semantics to all strings we want to translate. This way Home page is different from home page and we can ease the guy/girl translating our content who won’t have to figure out when a word starts a sentence or not.

The best part is that we can have all the items we want to translate gathered automatically without the need for spot marks (!– –!). To achieve this you need to follow these steps

  1. Create a simple two column table in the database called translations (id INT, item VARCHAR 1024)
  2. When the line method of the language class finds an item that doesn’t have a translation available make a check to see if this item is stored in the translations table.
  3. If it is not stored then create a line with the exact text you use in your language files. In our case this is $lang[‘Home page’] = ”;
    Insert this line in the translations table.
  4. Create two pages. One called something like ‘translations’ and another called ‘translations_clear’. The translations page should run a simple select query to get all the items and just display them in the browser. The translations_clear page should just empty the translations table.
  5. Run a spider against your page to make sure all links are visited. Any decent offline browser should do the trick. Windows users can use Httrack and linux users wget.
  6. Visit the translations page after the spider is done. You will get a page with all the items inside an array ready to be translated. Simply copy the entire page to your language file and start translating.

This way you can have web pages that are translated in one step, right before you fire your FTP client for the final upload. Also when something is added later but is not translated you won’t get __UGLY__ stuff in your page. You don’t have to use CodeIgniter to use this technique. All it takes is that you use a function to fetch the translated items.