ruby alphanumeric sort not working as expected

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-30 08:55:33

问题


Given the following array:

y = %w[A1 A2 B5 B12 A6 A8 B10 B3 B4 B8]
=> ["A1", "A2", "B5", "B12", "A6", "A8", "B10", "B3", "B4", "B8"]

With the expected sorted array to be:

=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]

Using the following (vanilla) sort, I get:

irb(main):2557:0> y.sort{|a,b| puts "%s <=> %s = %s\n" % [a, b, a <=> b]; a <=> b}
A1 <=> A8 = -1
A8 <=> B8 = -1
A2 <=> A8 = -1
B5 <=> A8 = 1
B4 <=> A8 = 1
B3 <=> A8 = 1
B10 <=> A8 = 1
B12 <=> A8 = 1
A6 <=> A8 = -1
A1 <=> A2 = -1
A2 <=> A6 = -1
B12 <=> B3 = -1
B3 <=> B8 = -1
B5 <=> B3 = 1
B4 <=> B3 = 1
B10 <=> B3 = -1  # this appears to be wrong, looks like 1 is being compared, not 10.
B12 <=> B10 = 1
B5 <=> B4 = 1
B4 <=> B8 = -1
B5 <=> B8 = -1
=> ["A1", "A2", "A6", "A8", "B10", "B12", "B3", "B4", "B5", "B8"]

...which is obviously not what I desire. I know I can attempt to split on the alpha first and then sort the numerical, but it just seems like I shouldn't have to do that.

Possible big caveat: we're stuck using Ruby 1.8.7 for now :( But even Ruby 2.0.0 is doing the same thing. What am I missing here?

Suggestions?


回答1:


You are sorting strings. Strings are sorted like strings, not like numbers. If you want to sort like numbers, then you should sort numbers, not strings. The string 'B10' is lexicographically smaller than the string 'B3', that's not something unique to Ruby, that's not even something unique to programming, that's how lexicographically sorting a piece of text works pretty much everywhere, in programming, databases, lexicons, dictionaries, phonebooks, etc.

You should split your strings into their numerical and non-numerical components, and convert the numerical components to numbers. Array sorting is lexicographic, so this will end up sorting exactly right:

y.sort_by {|s| # use `sort_by` for a keyed sort, not `sort`
  s.
    split(/(\d+)/). # split numeric parts from non-numeric
    map {|s| # the below parses numeric parts as decimals, ignores the rest
      begin Integer(s, 10); rescue ArgumentError; s end }}
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]



回答2:


If you know what the maximum amount of digits in your numbers is you can also prefix your numbers with 0 during comparison.

y.sort_by { |string| string.gsub(/\d+/) { |digits| format('%02d', digits.to_i) } }
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]

Here '%02d' specifies the following, the % denotes the formatting of a value, the 0 then specifies to prefix the number with 0s, the 2 specifies the total length of the number, the d specifies that you want the output in decimals (base 10). You can find additional info here.

This means that 'A1' will be converted to 'A01', 'B8' will become 'B08' and 'B12' will stay 'B12', since it already has 2 digits. This is only used during comparison.




回答3:


Here are a couple of ways to do that.

arr = ["A1", "A2", "B5", "B12", "A6", "AB12", "A8", "B10", "B3", "B4",
       "B8", "AB2"]

Sort on a 2-element array

arr.sort_by { |s| [s[/\D+/], s[/\d+/].to_i] }
  #=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
  #    "B10", "B12"] 

This is similar to @Jorg's solution except I've computed the two elements of the comparison array separately, rather than splitting the string into two parts and converting the latter to an integer.

Enumerable#sort_by compares each pair of elements of arr with the spaceship method, <=>. As the elements being compared are arrays, the method Array#<=> is used. See in particular the third paragraph of that doc.

sort_by compares the following 2-element arrays:

arr.each { |s| puts "%s-> [%s, %d]" %
  ["\"#{s}\"".ljust(7), "\"#{s[/\D+/]}\"".ljust(4), s[/\d+/].to_i] }

"A1"   -> ["A" , 1]
"A2"   -> ["A" , 2]
"B5"   -> ["B" , 5]
"B12"  -> ["B" , 12]
"A6"   -> ["A" , 6]
"AB12" -> ["AB", 12]
"A8"   -> ["A" , 8]
"B10"  -> ["B" , 10]
"B3"   -> ["B" , 3]
"B4"   -> ["B" , 4]
"B8"   -> ["B" , 8]
"AB2"  -> ["AB", 2]

Insert spaces between the alphameric and numeric parts of the string

max_len = arr.max_by(&:size).size
  #=> 4
arr.sort_by { |s| "%s%s%d" % [s[/\D+/], " "*(max_len-s.size), s[/\d+/].to_i] }
  #=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
  #    "B10", "B12"]

Here sort_by compares the following strings:

arr.each { |s| puts "%s-> \"%s\"" %
  ["\"#{s}\"".ljust(7), s[/\D+/] + " "*(max_len-s.size) + s[/\d+/]] }

"A1"   -> "A  1"
"A2"   -> "A  2"
"B5"   -> "B  5"
"B12"  -> "B 12"
"A6"   -> "A  6"
"AB12" -> "AB12"
"A8"   -> "A  8"
"B10"  -> "B 10"
"B3"   -> "B  3"
"B4"   -> "B  4"
"B8,"  -> "B 8"
"AB2"  -> "AB 2"



回答4:


A natural or lexicographic sort, not a standard character-value-based sort, would be needed. Something like these gems would be a starting point: https://github.com/dogweather/naturally, https://github.com/johnnyshields/naturalsort

Humans treat a string like "A2" as "A" followed by the number 2, and sort by using character-string sorting for the string part and numeric sorting for the numeric part. Standard sort() uses character-value sorting treating the string as a sequence of characters regardless of what the characters are. So for sort() "A10" and "A2" look like [ 'A', '1', '0' ] and [ 'A', '2' ], since '1' sorts before '2' and the following characters can't change that order "A10" thus sorts before "A2". For humans the same strings look like [ "A", 10 ] and [ "A", 2 ], 10 sorts after 2 so we get the opposite result. The strings can be manipulated to make the character-value-based sort() produce the expected result by making the numeric portion fixed-width and zero-padding it on the left to avoid embedded spaces, making "A2" turn into "A02" which does sort before "A10" using standard sort().



来源:https://stackoverflow.com/questions/39023861/ruby-alphanumeric-sort-not-working-as-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!