What is a good definition for language code and locale codes?

前端 未结 4 1418
青春惊慌失措
青春惊慌失措 2021-01-31 08:59

  • When to use en_GB and en-GB ?
  • What is the difference ?
  • Is there an ISO name for this ISO 639-1 (language) and ISO 3166
  • 相关标签:
    4条回答
    • 2021-01-31 09:27

      It depends on technology. For example in Java Locale.UK will give you en_GB code (if you care enough to call toString()). This is what you would pass between modules (unless you are passing concrete type) and this is what you would write into configuration files (i.e. faces-context.xml).
      In .Net on the other hand, you would certainly use en-GB.

      en-GB form is definitely more common and in most cases this is the form you should use.

      The different is obvious: the separator :) Otherwise there is no difference (in the meaning, specific technology might impose some constraints on Locale identifier).

      There is no ISO normative document that handles language and country combination, per my knowledge. In Software Internationalization it is part of Locale Model.

      0 讨论(0)
    • 2021-01-31 09:37

      A locale is a combination of language and region (usually a country).

      The separator ca be _ or -, but the recommended one is dash.

      Probably you are looking for BCP-47 standard that make use of language codes from ISO 639-1 and region/country codes from ISO 3166-1 alpha-2 (usually written in upper case).

      You can find more information about them here http://blog.i18n.ro/simplified-locale-codes/

      0 讨论(0)
    • 2021-01-31 09:41

      There are several systems for locale identifiers. Many of them are similar at the first glance, but not when you go deeper:

      Some examples (Serbian-Serbia with Latin Script, Japanese-Japan with radical sorting):

      • UTS-35, ICU, Mac OS X, Flash: sr-Latn-RS, ja-JP@collation=radical
      • Newer UTS-35, BCP 47 extension U: sr-Latn-RS, ja-JP-u-co-unihan
      • Win 2000, XP: 0x81a, 0x10411
      • Vista, Win 7: sr-Latn-CS, ja-JP_radical
      • Java: sr_CS, ja_JP
      • Java 7: sr_RS, ja_JP
      • Linux: sr_RS@latin, ja_JP.utf8

      Think of it like different ways to talk about colors (RGB, CMYB, HSV, Pantone, etc.)

      So - vs. _ does not make sense unless you specify what the is the environment you are using. Use - and Java will not understand it, use _ and Windows will not understand it. ICU (and systems build on top of it) accept both - and _, but produce the _ style.

      There is no ISO that covers the combination of language-country. But there are ISOs that cover the various parts (language, country, script). The exact version of the ISO also depends on the system used for locale identifiers.


      In general you should accept both _ and -, and generate only one ("be liberal in what you accept and strict in what you emit") (like ICU).

      If you communicate with systems using another type of locale identifier, you will have to map to/from your system. That will force you to use _ or -. Some of the mappings will be lossy (there is no way to specify alternate calendars in Windows, Linux; or alternate sorting or scripts in Java older than 7, etc.) and round-tripping might not be possible (somewhat similar to conversions RGB-CMYK).

      Addition: things are different not only between systems, but they can change in time. For instance Java 7 added support for sr_RS and for scripts, Windows keeps adding support for more locales, new countries get created (Sudan split, Russia, Serbia) or disappear (East Germany, U.S.S.R, Yugoslavia) and so on.

      For internal representation you might want to choose the most powerful one, that can represent everything, and that is UTS-35 / BCP 47 (also used by CLDR and ICU).

      0 讨论(0)
    • 2021-01-31 09:44

      It's covered for the Internet in RFC 3066 and denotes "en-GB" not "en_GB"

      0 讨论(0)
    提交回复
    热议问题