Sort on a string that may contain a number

前端未结

关注

 23  2071

I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the strin

相关标签:

23条回答

自闭症患者

2020-11-22 03:49

The Alphanum Algorithm

From the website

"People sort strings with numbers differently than software. Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it."

Edit: Here's a link to the Java Comparator Implementation from that site.

0 讨论(0)
发布评论:

提交评论
- 加载中...

滥情空心

2020-11-22 03:51

Here is the solution with the following advantages over Alphanum Algorithm:

3.25x times faster (tested on the data from 'Epilogue' chapter of Alphanum description)
Does not consume extra memory (no string splitting, no numbers parsing)
Processes leading zeros correctly (e.g. "0001" equals "1", "01234" is less than "4567")

public class NumberAwareComparator implements Comparator<String>
{
    @Override
    public int compare(String s1, String s2)
    {
        int len1 = s1.length();
        int len2 = s2.length();
        int i1 = 0;
        int i2 = 0;
        while (true)
        {
            // handle the case when one string is longer than another
            if (i1 == len1)
                return i2 == len2 ? 0 : -1;
            if (i2 == len2)
                return 1;

            char ch1 = s1.charAt(i1);
            char ch2 = s2.charAt(i2);
            if (Character.isDigit(ch1) && Character.isDigit(ch2))
            {
                // skip leading zeros
                while (i1 < len1 && s1.charAt(i1) == '0')
                    i1++;
                while (i2 < len2 && s2.charAt(i2) == '0')
                    i2++;

                // find the ends of the numbers
                int end1 = i1;
                int end2 = i2;
                while (end1 < len1 && Character.isDigit(s1.charAt(end1)))
                    end1++;
                while (end2 < len2 && Character.isDigit(s2.charAt(end2)))
                    end2++;

                int diglen1 = end1 - i1;
                int diglen2 = end2 - i2;

                // if the lengths are different, then the longer number is bigger
                if (diglen1 != diglen2)
                    return diglen1 - diglen2;

                // compare numbers digit by digit
                while (i1 < end1)
                {
                    if (s1.charAt(i1) != s2.charAt(i2))
                        return s1.charAt(i1) - s2.charAt(i2);
                    i1++;
                    i2++;
                }
            }
            else
            {
                // plain characters comparison
                if (ch1 != ch2)
                    return ch1 - ch2;
                i1++;
                i2++;
            }
        }
    }
}

0 讨论(0)

轮回少年

2020-11-22 03:52

Instead of reinventing the wheel, I'd suggest to use a locale-aware Unicode-compliant string comparator that has built-in number sorting from the ICU4J library.

import com.ibm.icu.text.Collator;
import com.ibm.icu.text.RuleBasedCollator;

import java.util.Arrays;
import java.util.List;
import java.util.Locale;

public class CollatorExample {
    public static void main(String[] args) {
        // Make sure to choose correct locale: in Turkish uppercase of "i" is "İ", not "I"
        RuleBasedCollator collator = (RuleBasedCollator) Collator.getInstance(Locale.US);
        collator.setNumericCollation(true); // Place "10" after "2"
        collator.setStrength(Collator.PRIMARY); // Case-insensitive
        List<String> strings = Arrays.asList("10", "20", "A20", "2", "t1ab", "01", "T010T01", "t1aB",
            "_2", "001", "_200", "1", "A 02", "t1Ab", "a2", "_1", "t1A", "_01",
            "100", "02", "T0010T01", "t1AB", "10", "A01", "010", "t1a"
        );
        strings.sort(collator);
        System.out.println(String.join(", ", strings));
        // Output: _1, _01, _2, _200, 01, 001, 1,
        // 2, 02, 10, 10, 010, 20, 100, A 02, A01, 
        // a2, A20, t1A, t1a, t1ab, t1aB, t1Ab, t1AB,
        // T010T01, T0010T01
    }
}

0 讨论(0)

情书的邮戳

2020-11-22 03:53

The implementation I propose here is simple and efficient. It does not allocate any extra memory, directly or indirectly by using regular expressions or methods such as substring(), split(), toCharArray(), etc.

This implementation first goes across both strings to search for the first characters that are different, at maximal speed, without doing any special processing during this. Specific number comparison is triggered only when these characters are both digits. A side-effect of this implementation is that a digit is considered as greater than other letters, contrarily to default lexicographic order.

public static final int compareNatural (String s1, String s2)
{
   // Skip all identical characters
   int len1 = s1.length();
   int len2 = s2.length();
   int i;
   char c1, c2;
   for (i = 0, c1 = 0, c2 = 0; (i < len1) && (i < len2) && (c1 = s1.charAt(i)) == (c2 = s2.charAt(i)); i++);

   // Check end of string
   if (c1 == c2)
      return(len1 - len2);

   // Check digit in first string
   if (Character.isDigit(c1))
   {
      // Check digit only in first string 
      if (!Character.isDigit(c2))
         return(1);

      // Scan all integer digits
      int x1, x2;
      for (x1 = i + 1; (x1 < len1) && Character.isDigit(s1.charAt(x1)); x1++);
      for (x2 = i + 1; (x2 < len2) && Character.isDigit(s2.charAt(x2)); x2++);

      // Longer integer wins, first digit otherwise
      return(x2 == x1 ? c1 - c2 : x1 - x2);
   }

   // Check digit only in second string
   if (Character.isDigit(c2))
      return(-1);

   // No digits
   return(c1 - c2);
}

0 讨论(0)

走了就别回头了

2020-11-22 03:53

I realize you're in java, but you can take a look at how StrCmpLogicalW works. It's what Explorer uses to sort filenames in Windows. You can look at the WINE implementation here.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3 4