String.equalsIgnoreCase - UpperCase v. LowerCase

后端 未结 1 2027
星月不相逢
星月不相逢 2021-02-05 14:17

I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:

if (ignoreCase) {
    // If c         


        
相关标签:
1条回答
  • 2021-02-05 14:37

    Now that the question is re-opened, I transfer my answer here.

    The short answer to "Why do they not just compare only lowercase instead of both upper and lower case, if it matches more cases than uppercase?": It does not match more character pairs, it merely matches different pairs.

    Comparing only uppercase is not enough, e.g. the ASCII letter "I" and the capital I with dot "İ" ((char)304, used in Turkish alphabet) have different uppercase (they are already uppercase), but they have the same lowercase letter "i". (Note that the Turkish language considers i with dot and i without dot as different letters, not just an accented letter, similar to German with its Umlauts ä/ö/ü vs. a/o/u.)

    Comparing only lowercase is not enough, e.g. the ASCII letter "i" and the small dotless i "ı" ((char)305). They have different lowercase (they are already lowercase), but they have the same uppercase letter "I".

    And finally, compare capital I with dot "İ" with small dotless i "ı". Neither their uppercases ("İ" vs. "I") nor their lowercases ("i" vs. "ı") match, but the lowercase of their uppercase is the same ("I"). I found another case if this phenomenon, in the greek letters "ϴ" and "ϑ" (char 1012 and 977).

    So a true case insensitive comparison can not even check uppercases and lowercases of the original characters, but must check the lowercases of the uppercases.

    0 讨论(0)
提交回复
热议问题