Java collation ignores space

前端 未结 2 1867
伪装坚强ぢ
伪装坚强ぢ 2020-12-16 14:07

I became recently aware, that Java Collation seems to ignore spaces.

I have a list of the following terms:

Amman Jost 
Ammann Heinrich 
Ammanner Jose         


        
相关标签:
2条回答
  • 2020-12-16 14:38

    If you cannot modify the locale for some reasons, then I would propose that you write everything by yourself. Here are some ideas, though this code is not complete and does not work:

    • Instead of having a list of Strings, create your own objects, implementing comparable:

      public class myString implements Comparable<myString> {
          private String name;
      
          public myString(String name) {
             this.name = name;
          }
      }
      
    • Then you will need to implement (see an example here)

      public int compareTo(myString compareMyString) {
          ...
      }
      
    • Now comes the trickier part:

      • In order to compare your strings, you will need to split them (this will result in an array of Strings). For instance:

        // Original String
        "Barr Burt"
        
        // Splitted String
        [0]: "Barr"
        [1]: "Burt"
        
      • You will need to compare the words one after the other. Create a function doing something like this (This is a pseudo code: "this.words[i]" calls the i-th word of "this.name")

        public int compareWords(myString compareMyString, int i)
        {
            if (this.words[i] < compareMyString.words[i])
                return -1; // "this" should come before "compareMyString"
        
            if (this.words[i] > compareMyString.words[i])
                return 1; // "this" should come after "compareMyString"
        
            if (this.words[i] == compareMyString.words[i])
                return compareWords(i+1);
        }
        
      • And then compareTo:

        public int compareTo(myString compareMyString) {
            return compareWords(compareMyString, 0);
        }
        
    0 讨论(0)
  • 2020-12-16 14:43

    You can customize the collation. Try looking at the source code to see how the Collator for German locale is built, as described in this answer.

    Then adapt it to your needs. The tutorial gives a starting point. But no need to do all the work, someone else already has done it: see this blog post dealing with the exact same problem for Czech.

    The essence of the solution linked above is:

    String rules = ((RuleBasedCollator) Collator.getInstance(Locale.GERMANY)).getRules();
    RuleBasedCollator correctedCollator 
        = new RuleBasedCollator(rules.replaceAll("<'\u005f'", "<' '<'\u005f'"));
    

    This adds a rule for the space character just before the rule for underscore.

    I confess I haven't tested this personally.

    0 讨论(0)
提交回复
热议问题