Sort on a string that may contain a number

前端 未结 23 2072
走了就别回头了
走了就别回头了 2020-11-22 02:59

I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the strin

相关标签:
23条回答
  • 2020-11-22 03:34

    On Linux glibc provides strverscmp(), it's also available from gnulib for portability. However truly "human" sorting has lots of other quirks like "The Beatles" being sorted as "Beatles, The". There is no simple solution to this generic problem.

    0 讨论(0)
  • 2020-11-22 03:38

    Although the question asked a java solution, for anyone who wants a scala solution:

    object Alphanum {
    
       private[this] val regex = "((?<=[0-9])(?=[^0-9]))|((?<=[^0-9])(?=[0-9]))"
    
       private[this] val alphaNum: Ordering[String] = Ordering.fromLessThan((ss1: String, ss2: String) => (ss1, ss2) match {
         case (sss1, sss2) if sss1.matches("[0-9]+") && sss2.matches("[0-9]+") => sss1.toLong < sss2.toLong
         case (sss1, sss2) => sss1 < sss2
       })
    
       def ordering: Ordering[String] = Ordering.fromLessThan((s1: String, s2: String) => {
         import Ordering.Implicits.infixOrderingOps
         implicit val ord: Ordering[List[String]] = Ordering.Implicits.seqDerivedOrdering(alphaNum)
    
         s1.split(regex).toList < s2.split(regex).toList
       })
    
    }
    
    0 讨论(0)
  • 2020-11-22 03:40

    Ian Griffiths of Microsoft has a C# implementation he calls Natural Sorting. Porting to Java should be fairly easy, easier than from C anyway!

    UPDATE: There seems to be a Java example on eekboom that does this, see the "compareNatural" and use that as your comparer to sorts.

    0 讨论(0)
  • 2020-11-22 03:41

    Short answer: based on the context, I can't tell whether this is just some quick-and-dirty code for personal use, or a key part of Goldman Sachs' latest internal accounting software, so I'll open by saying: eww. That's a rather funky sorting algorithm; try to use something a bit less "twisty" if you can.

    Long answer:

    The two issues that immediately come to mind in your case are performance, and correctness. Informally, make sure it's fast, and make sure your algorithm is a total ordering.

    (Of course, if you're not sorting more than about 100 items, you can probably disregard this paragraph.) Performance matters, as the speed of the comparator will be the largest factor in the speed of your sort (assuming the sort algorithm is "ideal" to the typical list). In your case, the comparator's speed will depend mainly on the size of the string. The strings seem to be fairly short, so they probably won't dominate as much as the size of your list.

    Turning each string into a string-number-string tuple and then sorting this list of tuples, as suggested in another answer, will fail in some of your cases, since you apparently will have strings with multiple numbers appearing.

    The other problem is correctness. Specifically, if the algorithm you described will ever permit A > B > ... > A, then your sort will be non-deterministic. In your case, I fear that it might, though I can't prove it. Consider some parsing cases such as:

      aa 0 aa
      aa 23aa
      aa 2a3aa
      aa 113aa
      aa 113 aa
      a 1-2 a
      a 13 a
      a 12 a
      a 2-3 a
      a 21 a
      a 2.3 a
    
    0 讨论(0)
  • 2020-11-22 03:42

    In your given example, the numbers you want to compare have spaces around them while the other numbers do not, so why would a regular expression not work?

    bbb 12 ccc

    vs.

    eee 12 ffffd jpeg2000 eee

    0 讨论(0)
  • 2020-11-22 03:44

    The Alphanum algrothim is nice, but it did not match requirements for a project I'm working on. I need to be able to sort negative numbers and decimals correctly. Here is the implementation I came up. Any feedback would be much appreciated.

    public class StringAsNumberComparator implements Comparator<String> {
    
        public static final Pattern NUMBER_PATTERN = Pattern.compile("(\\-?\\d+\\.\\d+)|(\\-?\\.\\d+)|(\\-?\\d+)");
    
        /**
         * Splits strings into parts sorting each instance of a number as a number if there is
         * a matching number in the other String.
         * 
         * For example A1B, A2B, A11B, A11B1, A11B2, A11B11 will be sorted in that order instead
         * of alphabetically which will sort A1B and A11B together.
         */
        public int compare(String str1, String str2) {
            if(str1 == str2) return 0;
            else if(str1 == null) return 1;
            else if(str2 == null) return -1;
    
            List<String> split1 = split(str1);
            List<String> split2 = split(str2);
            int diff = 0;
    
            for(int i = 0; diff == 0 && i < split1.size() && i < split2.size(); i++) {
                String token1 = split1.get(i);
                String token2 = split2.get(i);
    
                if((NUMBER_PATTERN.matcher(token1).matches() && NUMBER_PATTERN.matcher(token2).matches()) {
                    diff = (int) Math.signum(Double.parseDouble(token1) - Double.parseDouble(token2));
                } else {
                    diff = token1.compareToIgnoreCase(token2);
                }
            }
            if(diff != 0) {
                return diff;
            } else {
                return split1.size() - split2.size();
            }
        }
    
        /**
         * Splits a string into strings and number tokens.
         */
        private List<String> split(String s) {
            List<String> list = new ArrayList<String>();
            try (Scanner scanner = new Scanner(s)) {
                int index = 0;
                String num = null;
                while ((num = scanner.findInLine(NUMBER_PATTERN)) != null) {
                    int indexOfNumber = s.indexOf(num, index);
                    if (indexOfNumber > index) {
                        list.add(s.substring(index, indexOfNumber));
                    }
                    list.add(num);
                    index = indexOfNumber + num.length();
                }
                if (index < s.length()) {
                    list.add(s.substring(index));
                }
            }
            return list;
        }
    }
    

    PS. I wanted to use the java.lang.String.split() method and use "lookahead/lookbehind" to keep the tokens, but I could not get it to work with the regular expression I was using.

    0 讨论(0)
提交回复
热议问题