I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:
if (ignoreCase) {
// If c
Now that the question is re-opened, I transfer my answer here.
The short answer to "Why do they not just compare only lowercase instead of both upper and lower case, if it matches more cases than uppercase?": It does not match more character pairs, it merely matches different pairs.
Comparing only uppercase is not enough, e.g. the ASCII letter "I" and the capital I with dot "İ" ((char)304
, used in Turkish alphabet) have different uppercase (they are already uppercase), but they have the same lowercase letter "i". (Note that the Turkish language considers i with dot and i without dot as different letters, not just an accented letter, similar to German with its Umlauts ä/ö/ü vs. a/o/u.)
Comparing only lowercase is not enough, e.g. the ASCII letter "i" and the small dotless i "ı" ((char)305
). They have different lowercase (they are already lowercase), but they have the same uppercase letter "I".
And finally, compare capital I with dot "İ" with small dotless i "ı". Neither their uppercases ("İ" vs. "I") nor their lowercases ("i" vs. "ı") match, but the lowercase of their uppercase is the same ("I"). I found another case if this phenomenon, in the greek letters "ϴ" and "ϑ" (char 1012 and 977).
So a true case insensitive comparison can not even check uppercases and lowercases of the original characters, but must check the lowercases of the uppercases.