I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the strin
On Linux glibc provides strverscmp(), it's also available from gnulib for portability. However truly "human" sorting has lots of other quirks like "The Beatles" being sorted as "Beatles, The". There is no simple solution to this generic problem.
Although the question asked a java solution, for anyone who wants a scala solution:
object Alphanum {
private[this] val regex = "((?<=[0-9])(?=[^0-9]))|((?<=[^0-9])(?=[0-9]))"
private[this] val alphaNum: Ordering[String] = Ordering.fromLessThan((ss1: String, ss2: String) => (ss1, ss2) match {
case (sss1, sss2) if sss1.matches("[0-9]+") && sss2.matches("[0-9]+") => sss1.toLong < sss2.toLong
case (sss1, sss2) => sss1 < sss2
})
def ordering: Ordering[String] = Ordering.fromLessThan((s1: String, s2: String) => {
import Ordering.Implicits.infixOrderingOps
implicit val ord: Ordering[List[String]] = Ordering.Implicits.seqDerivedOrdering(alphaNum)
s1.split(regex).toList < s2.split(regex).toList
})
}
Ian Griffiths of Microsoft has a C# implementation he calls Natural Sorting. Porting to Java should be fairly easy, easier than from C anyway!
UPDATE: There seems to be a Java example on eekboom that does this, see the "compareNatural" and use that as your comparer to sorts.
Short answer: based on the context, I can't tell whether this is just some quick-and-dirty code for personal use, or a key part of Goldman Sachs' latest internal accounting software, so I'll open by saying: eww. That's a rather funky sorting algorithm; try to use something a bit less "twisty" if you can.
Long answer:
The two issues that immediately come to mind in your case are performance, and correctness. Informally, make sure it's fast, and make sure your algorithm is a total ordering.
(Of course, if you're not sorting more than about 100 items, you can probably disregard this paragraph.) Performance matters, as the speed of the comparator will be the largest factor in the speed of your sort (assuming the sort algorithm is "ideal" to the typical list). In your case, the comparator's speed will depend mainly on the size of the string. The strings seem to be fairly short, so they probably won't dominate as much as the size of your list.
Turning each string into a string-number-string tuple and then sorting this list of tuples, as suggested in another answer, will fail in some of your cases, since you apparently will have strings with multiple numbers appearing.
The other problem is correctness. Specifically, if the algorithm you described will ever permit A > B > ... > A, then your sort will be non-deterministic. In your case, I fear that it might, though I can't prove it. Consider some parsing cases such as:
aa 0 aa
aa 23aa
aa 2a3aa
aa 113aa
aa 113 aa
a 1-2 a
a 13 a
a 12 a
a 2-3 a
a 21 a
a 2.3 a
In your given example, the numbers you want to compare have spaces around them while the other numbers do not, so why would a regular expression not work?
bbb 12 ccc
vs.
eee 12 ffffd jpeg2000 eee
The Alphanum algrothim is nice, but it did not match requirements for a project I'm working on. I need to be able to sort negative numbers and decimals correctly. Here is the implementation I came up. Any feedback would be much appreciated.
public class StringAsNumberComparator implements Comparator<String> {
public static final Pattern NUMBER_PATTERN = Pattern.compile("(\\-?\\d+\\.\\d+)|(\\-?\\.\\d+)|(\\-?\\d+)");
/**
* Splits strings into parts sorting each instance of a number as a number if there is
* a matching number in the other String.
*
* For example A1B, A2B, A11B, A11B1, A11B2, A11B11 will be sorted in that order instead
* of alphabetically which will sort A1B and A11B together.
*/
public int compare(String str1, String str2) {
if(str1 == str2) return 0;
else if(str1 == null) return 1;
else if(str2 == null) return -1;
List<String> split1 = split(str1);
List<String> split2 = split(str2);
int diff = 0;
for(int i = 0; diff == 0 && i < split1.size() && i < split2.size(); i++) {
String token1 = split1.get(i);
String token2 = split2.get(i);
if((NUMBER_PATTERN.matcher(token1).matches() && NUMBER_PATTERN.matcher(token2).matches()) {
diff = (int) Math.signum(Double.parseDouble(token1) - Double.parseDouble(token2));
} else {
diff = token1.compareToIgnoreCase(token2);
}
}
if(diff != 0) {
return diff;
} else {
return split1.size() - split2.size();
}
}
/**
* Splits a string into strings and number tokens.
*/
private List<String> split(String s) {
List<String> list = new ArrayList<String>();
try (Scanner scanner = new Scanner(s)) {
int index = 0;
String num = null;
while ((num = scanner.findInLine(NUMBER_PATTERN)) != null) {
int indexOfNumber = s.indexOf(num, index);
if (indexOfNumber > index) {
list.add(s.substring(index, indexOfNumber));
}
list.add(num);
index = indexOfNumber + num.length();
}
if (index < s.length()) {
list.add(s.substring(index));
}
}
return list;
}
}
PS. I wanted to use the java.lang.String.split() method and use "lookahead/lookbehind" to keep the tokens, but I could not get it to work with the regular expression I was using.