Why is the same character compared twice by changing its case to UPPER and then to lower?

人走茶凉 提交于 2021-01-19 03:14:02

问题


The below code is in Class String in java. I don't understand why the characters from two different strings are compared twice. at first by doing upper case and if that fails by doing lower case.

My Question here is, is it required? If yes, why?

  public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                             = new CaseInsensitiveComparator();
        private static class CaseInsensitiveComparator
                implements Comparator<String>, java.io.Serializable {
            // use serialVersionUID from JDK 1.2.2 for interoperability
            private static final long serialVersionUID = 8575799808933029326L;

            public int compare(String s1, String s2) {
                int n1 = s1.length();
                int n2 = s2.length();
                int min = Math.min(n1, n2);
                for (int i = 0; i < min; i++) {
                    char c1 = s1.charAt(i);
                    char c2 = s2.charAt(i);
                    if (c1 != c2) {
                        c1 = Character.toUpperCase(c1);
                        c2 = Character.toUpperCase(c2);
                        if (c1 != c2) {
                            c1 = Character.toLowerCase(c1);
                            c2 = Character.toLowerCase(c2);
                            if (c1 != c2) {
                                // No overflow because of numeric promotion
                                return c1 - c2;
                            }
                        }
                    }
                }
                return n1 - n2;
            }
        }

回答1:


The issue might be more complex.

There are characters, where there are multiple lowercase codepoints for the same uppercase codepoint or vice versa. So to check for case insensitive match, you need to compare both upper and lowercase versions if one of them matches.

One example being

The Greek upper-case letter "Σ" has two different lower-case forms: "ς" in word-final position and "σ" elsewhere.

Source: Wikipedia

For upper case not equal but lowercase very much so, VGR supplied this excellent example:

A better example would be '\u0130' (İ) and 'I'. Passing them through toUpperCase leaves them unchanged (and therefore different), but passing them through toLowerCase results in identical character values



来源:https://stackoverflow.com/questions/34613630/why-is-the-same-character-compared-twice-by-changing-its-case-to-upper-and-then

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!