Sort on a string that may contain a number

前端 未结 23 2050
走了就别回头了
走了就别回头了 2020-11-22 02:59

I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the strin

相关标签:
23条回答
  • 2020-11-22 03:49

    The Alphanum Algorithm

    From the website

    "People sort strings with numbers differently than software. Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it."

    Edit: Here's a link to the Java Comparator Implementation from that site.

    0 讨论(0)
  • 2020-11-22 03:51

    Here is the solution with the following advantages over Alphanum Algorithm:

    1. 3.25x times faster (tested on the data from 'Epilogue' chapter of Alphanum description)
    2. Does not consume extra memory (no string splitting, no numbers parsing)
    3. Processes leading zeros correctly (e.g. "0001" equals "1", "01234" is less than "4567")
    public class NumberAwareComparator implements Comparator<String>
    {
        @Override
        public int compare(String s1, String s2)
        {
            int len1 = s1.length();
            int len2 = s2.length();
            int i1 = 0;
            int i2 = 0;
            while (true)
            {
                // handle the case when one string is longer than another
                if (i1 == len1)
                    return i2 == len2 ? 0 : -1;
                if (i2 == len2)
                    return 1;
    
                char ch1 = s1.charAt(i1);
                char ch2 = s2.charAt(i2);
                if (Character.isDigit(ch1) && Character.isDigit(ch2))
                {
                    // skip leading zeros
                    while (i1 < len1 && s1.charAt(i1) == '0')
                        i1++;
                    while (i2 < len2 && s2.charAt(i2) == '0')
                        i2++;
    
                    // find the ends of the numbers
                    int end1 = i1;
                    int end2 = i2;
                    while (end1 < len1 && Character.isDigit(s1.charAt(end1)))
                        end1++;
                    while (end2 < len2 && Character.isDigit(s2.charAt(end2)))
                        end2++;
    
                    int diglen1 = end1 - i1;
                    int diglen2 = end2 - i2;
    
                    // if the lengths are different, then the longer number is bigger
                    if (diglen1 != diglen2)
                        return diglen1 - diglen2;
    
                    // compare numbers digit by digit
                    while (i1 < end1)
                    {
                        if (s1.charAt(i1) != s2.charAt(i2))
                            return s1.charAt(i1) - s2.charAt(i2);
                        i1++;
                        i2++;
                    }
                }
                else
                {
                    // plain characters comparison
                    if (ch1 != ch2)
                        return ch1 - ch2;
                    i1++;
                    i2++;
                }
            }
        }
    }
    
    0 讨论(0)
  • 2020-11-22 03:52

    Instead of reinventing the wheel, I'd suggest to use a locale-aware Unicode-compliant string comparator that has built-in number sorting from the ICU4J library.

    import com.ibm.icu.text.Collator;
    import com.ibm.icu.text.RuleBasedCollator;
    
    import java.util.Arrays;
    import java.util.List;
    import java.util.Locale;
    
    public class CollatorExample {
        public static void main(String[] args) {
            // Make sure to choose correct locale: in Turkish uppercase of "i" is "İ", not "I"
            RuleBasedCollator collator = (RuleBasedCollator) Collator.getInstance(Locale.US);
            collator.setNumericCollation(true); // Place "10" after "2"
            collator.setStrength(Collator.PRIMARY); // Case-insensitive
            List<String> strings = Arrays.asList("10", "20", "A20", "2", "t1ab", "01", "T010T01", "t1aB",
                "_2", "001", "_200", "1", "A 02", "t1Ab", "a2", "_1", "t1A", "_01",
                "100", "02", "T0010T01", "t1AB", "10", "A01", "010", "t1a"
            );
            strings.sort(collator);
            System.out.println(String.join(", ", strings));
            // Output: _1, _01, _2, _200, 01, 001, 1,
            // 2, 02, 10, 10, 010, 20, 100, A 02, A01, 
            // a2, A20, t1A, t1a, t1ab, t1aB, t1Ab, t1AB,
            // T010T01, T0010T01
        }
    }
    
    0 讨论(0)
  • 2020-11-22 03:53

    The implementation I propose here is simple and efficient. It does not allocate any extra memory, directly or indirectly by using regular expressions or methods such as substring(), split(), toCharArray(), etc.

    This implementation first goes across both strings to search for the first characters that are different, at maximal speed, without doing any special processing during this. Specific number comparison is triggered only when these characters are both digits. A side-effect of this implementation is that a digit is considered as greater than other letters, contrarily to default lexicographic order.

    public static final int compareNatural (String s1, String s2)
    {
       // Skip all identical characters
       int len1 = s1.length();
       int len2 = s2.length();
       int i;
       char c1, c2;
       for (i = 0, c1 = 0, c2 = 0; (i < len1) && (i < len2) && (c1 = s1.charAt(i)) == (c2 = s2.charAt(i)); i++);
    
       // Check end of string
       if (c1 == c2)
          return(len1 - len2);
    
       // Check digit in first string
       if (Character.isDigit(c1))
       {
          // Check digit only in first string 
          if (!Character.isDigit(c2))
             return(1);
    
          // Scan all integer digits
          int x1, x2;
          for (x1 = i + 1; (x1 < len1) && Character.isDigit(s1.charAt(x1)); x1++);
          for (x2 = i + 1; (x2 < len2) && Character.isDigit(s2.charAt(x2)); x2++);
    
          // Longer integer wins, first digit otherwise
          return(x2 == x1 ? c1 - c2 : x1 - x2);
       }
    
       // Check digit only in second string
       if (Character.isDigit(c2))
          return(-1);
    
       // No digits
       return(c1 - c2);
    }
    
    0 讨论(0)
  • 2020-11-22 03:53

    I realize you're in java, but you can take a look at how StrCmpLogicalW works. It's what Explorer uses to sort filenames in Windows. You can look at the WINE implementation here.

    0 讨论(0)
提交回复
热议问题