I am going to work on a project where a fairly large web app needs to tweaked to handle several languages. The thing runs with a hand crafted PHP code but it\'s pretty clean
If it's multi-byte character support then it might be worth checking out the multibyte string functions in PHP:
http://uk.php.net/manual/en/book.mbstring.php
These will better handle multi-byte characters.
It is important to notice that there are two steps involved before translating:
See more on this in Wikipedia.
Step 1 would require you to take into account the fact that some languages are written right to left (RTL) and non-european characters such as Japanese or Chinese. If you are not planning to handle these languages and characters it might be simpler.
For this type of situation I would prefer to have a language file (actually as many language files as languages I plan to support, naming each as langcode.php
as in en.php
or fr.php
) with an associative array containing all the texts used in the site. The procedure would go as follows:
$lang['sectionname'][]
array$lang['sectionname']['textname']
entryLang.php
class that would receive a lang
parameter upon instantiation but would have a default in case no lang
is received (this method loads langcode.php
depending on the parameter or a default depending on your preferred language)setPage()
method that would receive the page/section you will be displayingshow()
method that would receive the text to be displayed (show()
would be called as many times as texts are shown in a given page... show()
being a kind of wrapper for echo $lang['mypage']['mytext']
)This way you could have as many languages as you want in a very easy way. You could even have a language admin where you open your base language page (you actually just read recursively the arrays and display them in textareas) and can then "Save as..." some other language.
I use a similar approach in my site. It is only one page though but I have made multi-page sites with this idea.
If you have user-submitted content or some rather complicated CMS it would be a different story. You could look for i18n-friendly frameworks (Drupal comes to mind).
I use hl parameter and gettext combining engine translations already there with own .po which makes new translations and languages appear when engine or my django/gae example adds:
{% get_current_language as LANGUAGE_CODE %}{{ LANGUAGE_CODE }}{% get_available_languages as LANGUAGES %}{% for LANGUAGE in LANGUAGES %}{% ifnotequal LANGUAGE_CODE LANGUAGE.0 %}{{ LANGUAGE.0 }}{% endifnotequal %}{% endfor %}
So keeping from duplicates and fully using translations already there lets forth here the missing eg arabic month names to appear directly either when engine team adds or app
There are a number of ways of tackling this. None of them "the best way" and all of them with problems in the short term or the long term. The very first thing to say is that multi lingual sites are not easy, translators and lovely people but hard to work with and most programmers see the problem as a technical one only. There is also another dimension, outside the scope of this answer, as to whether you are translating or localising. This involves looking at the target audiences cultural mores and then tailoring language, style, layout, colour, typeface etc., to that culture. Finally do not use MT, Machine Translation, for anything serious or if it needs to be accurate and when acquiring translators ensure that they are translating from a foreign language into their native language which means that they understand all the nuances of the target language.
Right. Solutions. On the basis that you do not want to rewrite the site then simply clone the site you have and translate the copies to the target language. Assuming the code base is stable you can use a VCS to manage any code changes. You can tweak individual parts of the site to fit the target language, for example French text is on average 30% larger than the equivalent English text so using one site to deliver this means you may (will) have formatting problems and need to swap a different css file in and out depending on the language. It might seem a clunky way to do it but then how long are the sites going to exist? The management overhead of doing it this way may well be less than other options.
Second way without rebuilding. Replace all content in the current site with tags and then put the different language in file or db tables, sniff the users desired language (do you have registered users who can make a preference or do you want to get the browser language tag, or is it going to be URL dot-com dot-fr, dot-de that make the choice) and then replace the tags with the target language. Then you need to address the sizing issues and the image issues separately. This solution is in effect when frameworks like Symfony and Zend do to implement l10n.
Then you could rebuild with a framework or with gettext and possibly have a cleaner solution but remember frameworks were designed to solve other problems, not translation and the translation component has come into the framework as partial solution not the full one.
The big problem with all the solutions is ongoing maintenance. Because not only do you have a code base but also multiple language bases to maintain. Unless you all in one solution is really clever and effective then to ongoing task will be difficult.
You could look at Zend_Translate, it's a pretty comprehensive, well documented and overall code quality is great. It also allows you to use a unified API for gettext, csv, db, ini file, array or whatever you end up saving your translated strings in.
Also, look at/watch this thread: What are good tools/frameworks for i18n of a php codebase?. It seems similar to your question.
Work with languages files.
That's for small sites.
If getting bigger, replace the files by a DB. :)