Unicode lowercase characters?

问题

I read up someplace, that there are characters other than A-Z that have a lowercase equivalent, in Unicode. Which could these be, and why would any other character need an upper and lower case?

回答1:

The English language, and even that strange variant, American English :-) , is not the only language on the planet. There are some very strange looking ones (at least to those familiar with the Latin-based characters) but even Latin-based ones have minor variations.

Two of which I am acquainted with on more than a casual basis are Greek and German:

Αα Ββ Γγ Δδ Εε Ζζ  Ηη Θθ Ιι Κκ Λλ Μμ
Νν Ξξ Οο Ππ Ρρ Σσς Ττ Υυ Φφ Χχ Ψψ Ωω

Aa Ää Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn
Oo Öö Pp Qq Rr Ss ß  Tt Uu Üü Vv Ww Xx Yy Zz

That's why we're not allowed to use bits of code like:

char lower = upper - 'A' + 'a';

any more. Doing something like that in a company that takes i18n seriously is near grounds for dismissal. Using Unicode-aware toLower()/toUpper()-type functions is the better way to go.

回答2:

An uppercase ß is not needed in the German language because the letter is never used as the first letter of a name or a word. For the rest, in some languages (French?) uppercase accented characters are not used, just the non-accented variant.

回答3:

There's a lot of alphabets other than the usual Latin-derived western European alphabet most of us are used to seeing here. To start with, you'd need uppercase and lowercase versions of accented letters and ligatures, like Àà, Ĳĳ, and so on. There's also the fullwidth versions of Latin characters used when setting documents in Asian languages (which I'm too lazy to look up). Further, there are the other alphabets in use nowadays, like the Cyrillic (Бб) and Greek (Δδ) alphabets.

There's also Turkey, which is just kind of difficult according to Jeff Atwood. Using the uppercasing/lowercasing functions provided by your environment are (usually) the way to go with user-input data.

回答4:

Any letter with an accent could potentially have different code point, or be a combination of more than one code point. For example, ÂËÕÝ are uppercase characters with lowercase equivalents.

The key is to implement the standards faithfully with respect to your users' locale settings, or get the same effect by using system libraries that handle the general case of toupper()/tolower() correctly.

回答5:

in some languages (French?) uppercase accented characters are not used (...)
[Reiner Bakels - Dec 10 '12 at 19:34]

Well, yes... but no!

In the good ol'times of manual "font" page making, that used to be true. Since an accentuated uppercase letter ("É" for example) would rise too high on a line, the usual practice was to ignore it an just display "E" instead. Then "des études" commonly appeared as "DES ETUDES" (without accent).

But that is not recommended anymore. Whenever one can edit/type/publish the accentuated capital letters, we are invited to do so. The very official Quebec's "Office de la langue française" is actually promoting this since more than two decades!

That is becoming specially crucial in our era of computers and the Web, where texts are more and more processed (read & translated) by machines. Omitting accents can entirely change the meaning. tache (stain) -vs- tâche (task), du (of) -vs- dû (something you have to pay), and many many more words. Continuing to omit the accents on uppercase is definitively not a good idea (although century old legacy). Using them whereever it is now possible is a far better practice.

来源：https://stackoverflow.com/questions/929079/unicode-lowercase-characters

标签

unicode

uppercase

lowercase