Which ISO format should I use to store a user's language code?

蹲街弑〆低调 提交于 2019-12-03 09:42:51
sorin

You should use IETF language tags because they are already used for HTTP/HTML/XML and many other technologies. They are based on several standards including the ISO-639 collection (yes language, region and culture selection are not so simple to define).

I wrote a more detailed article regarding the proper language code selection and usage. The idea is to use the simplest/shorter ISO-639-1 codes and specify more only for special cases. Inside the article there are codes for ~30 most used languages with reasons why I consider one alternative better than another.

In case you want to skip reading the entire article here is a short list of language codes (not to be confused with country codes): ar, cs, da, de, el, en, en-gb, es, fr, fi, he, hu, it, ja, ko, nb, nl, pl, pt, pt-pt, ro, ru, sv, tr, uk, zh, zh-hant

The following points may not be obvious but should be borne in mind:

  • en is used for en-us - American English, and for British English is used en-gb
  • pt is used for pt-br, and not pt-pt witch has much less speakers
  • zh is used instead of zh-hans, zh-CN,...
  • zh-hant (Traditional Chinese) is used instead of more specific codes like zh-hant-TW or zh-TW

You can find more explanations inside the article.

I would go with a derivative of ISO 639. Specifically I like to use this: http://en.wikipedia.org/wiki/IETF_language_tag

I'm no expert, but every site I've ever seen uses ISO 639-1, including the current site I'm working on.

It works for us!

I've only ever seen 2-character language codes in use - so I'd recommend going with them unless your work involves delving into linguistics in some way. If all you're doing is customizing the browsing experience for the world at large, you won't need the extra repertoire offered by 3-character codes.

ISO 639-1 Alpha-2 are used pretty much universally.

They are used for example in HTTP content negotiation. If you ever wondered how an international website can automatically show you their homepage in your native language, that's how it works. (Although it's sometimes kinda annoying. I, for example, often get shown the default Apache homepage in German, because the webmaster turned on content negotiation, but only put content for English in.)

Most web browsers use them directly in their settings dialog box.

Most operating systems use them in their settings dialog boxes or configuration files.

Wikipedia uses them in their server names for the different language versions.

In other words: if your users aren't native English speakers, they will probably already have encountered them when configuring their software, because otherwise they wouldn't be able to use their computers.

The other members of the ISO 639 family are mostly of interest to linguists. Unless you expect Jesus Christ himself (ISO 639-2 Alpha-3 code arc) to visit your website, or maybe Klingons (tlh), ISO 639-1 has more languages than you ever can hope to support.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!