问题
The below code is in Class String in java. I don't understand why the characters from two different strings are compared twice. at first by doing upper case and if that fails by doing lower case.
My Question here is, is it required? If yes, why?
public static final Comparator<String> CASE_INSENSITIVE_ORDER
= new CaseInsensitiveComparator();
private static class CaseInsensitiveComparator
implements Comparator<String>, java.io.Serializable {
// use serialVersionUID from JDK 1.2.2 for interoperability
private static final long serialVersionUID = 8575799808933029326L;
public int compare(String s1, String s2) {
int n1 = s1.length();
int n2 = s2.length();
int min = Math.min(n1, n2);
for (int i = 0; i < min; i++) {
char c1 = s1.charAt(i);
char c2 = s2.charAt(i);
if (c1 != c2) {
c1 = Character.toUpperCase(c1);
c2 = Character.toUpperCase(c2);
if (c1 != c2) {
c1 = Character.toLowerCase(c1);
c2 = Character.toLowerCase(c2);
if (c1 != c2) {
// No overflow because of numeric promotion
return c1 - c2;
}
}
}
}
return n1 - n2;
}
}
回答1:
The issue might be more complex.
There are characters, where there are multiple lowercase codepoints for the same uppercase codepoint or vice versa. So to check for case insensitive match, you need to compare both upper and lowercase versions if one of them matches.
One example being
The Greek upper-case letter "Σ" has two different lower-case forms: "ς" in word-final position and "σ" elsewhere.
Source: Wikipedia
For upper case not equal but lowercase very much so, VGR supplied this excellent example:
A better example would be '\u0130' (İ) and 'I'. Passing them through toUpperCase leaves them unchanged (and therefore different), but passing them through toLowerCase results in identical character values
来源:https://stackoverflow.com/questions/34613630/why-is-the-same-character-compared-twice-by-changing-its-case-to-upper-and-then