How I can sort array data alphanumerically in ruby?
Suppose my array is a = [test_0_1, test_0_2, test_0_3, test_0_4, test_0_5, test_0_6, test_0_7, test_0_8, te
Similar to @ctcherry answer, but faster:
a.sort_by {|s| "%s%05i%05i" % s.split('_') }.reverse
EDIT: My tests:
require 'benchmark'
ary = []
100_000.times { ary << "test_#{rand(1000)}_#{rand(1000)}" }
ary.uniq!; puts "Size: #{ary.size}"
Benchmark.bm(5) do |x|
x.report("sort1") do
ary.sort_by {|e| "%s%05i%05i" % e.split('_') }.reverse
end
x.report("sort2") do
ary.sort { |a,b|
ap = a.split('_')
a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
bp = b.split('_')
b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
b <=> a
}
end
x.report("sort3") do
ary.sort_by { |v| a, b, c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse
end
end
Output:
Size: 95166
user system total real
sort1 3.401000 0.000000 3.401000 ( 3.394194)
sort2 94.880000 0.624000 95.504000 ( 95.722475)
sort3 3.494000 0.000000 3.494000 ( 3.501201)
Sort routines can have greatly varying processing times. Benchmarking variations of the sort can quickly home in on the fastest way to do things:
#!/usr/bin/env ruby
ary = %w[
test_0_1 test_0_2 test_0_3 test_0_4 test_0_5 test_0_6 test_0_7
test_0_8 test_0_9 test_1_0 test_1_1 test_1_2 test_1_3 test_1_4 test_1_5
test_1_6 test_1_7 test_1_8 test_1_9 test_1_10 test_1_11 test_1_12 test_1_13
test_1_14 test_1_121
]
require 'ap'
ap ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse
And its output:
>> [
>> [ 0] "test_1_121",
>> [ 1] "test_1_14",
>> [ 2] "test_1_13",
>> [ 3] "test_1_12",
>> [ 4] "test_1_11",
>> [ 5] "test_1_10",
>> [ 6] "test_1_9",
>> [ 7] "test_1_8",
>> [ 8] "test_1_7",
>> [ 9] "test_1_6",
>> [10] "test_1_5",
>> [11] "test_1_4",
>> [12] "test_1_3",
>> [13] "test_1_2",
>> [14] "test_1_1",
>> [15] "test_1_0",
>> [16] "test_0_9",
>> [17] "test_0_8",
>> [18] "test_0_7",
>> [19] "test_0_6",
>> [20] "test_0_5",
>> [21] "test_0_4",
>> [22] "test_0_3",
>> [23] "test_0_2",
>> [24] "test_0_1"
>> ]
Testing the algorithms for speed shows:
require 'benchmark'
n = 50_000
Benchmark.bm(8) do |x|
x.report('sort1') { n.times { ary.sort { |a,b| b <=> a } } }
x.report('sort2') { n.times { ary.sort { |a,b| a <=> b }.reverse } }
x.report('sort3') { n.times { ary.sort { |a,b|
ap = a.split('_')
a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
bp = b.split('_')
b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
b <=> a
} } }
x.report('sort_by1') { n.times { ary.sort_by { |s| s } } }
x.report('sort_by2') { n.times { ary.sort_by { |s| s }.reverse } }
x.report('sort_by3') { n.times { ary.sort_by { |s| s.scan(/\d+/).map{ |s| s.to_i } }.reverse } }
x.report('sort_by4') { n.times { ary.sort_by { |v| a = v.split(/_+/); [a[0], a[1].to_i, a[2].to_i] }.reverse } }
x.report('sort_by5') { n.times { ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse } }
end
>> user system total real
>> sort1 0.900000 0.010000 0.910000 ( 0.919115)
>> sort2 0.880000 0.000000 0.880000 ( 0.893920)
>> sort3 43.840000 0.070000 43.910000 ( 45.970928)
>> sort_by1 0.870000 0.010000 0.880000 ( 1.077598)
>> sort_by2 0.820000 0.000000 0.820000 ( 0.858309)
>> sort_by3 7.060000 0.020000 7.080000 ( 7.623183)
>> sort_by4 6.800000 0.000000 6.800000 ( 6.827472)
>> sort_by5 6.730000 0.000000 6.730000 ( 6.762403)
>>
Sort1 and sort2 and sort_by1 and sort_by2 help establish baselines for sort
, sort_by
and both of those with reverse
.
Sorts sort3 and sort_by3 are two other answers on this page. Sort_by4 and sort_by5 are two spins on how I'd do it, with sort_by5 being the fastest I came up with after a few minutes of tinkering.
This shows how minor differences in the algorithm can make a difference in the final output. If there were more iterations, or larger arrays being sorted the differences would be more extreme.
You can pass a block to the sort function to custom sort it. In your case you will have a problem because your numbers aren't zero padded, so this method zero pads the numerical parts, then sorts them, resulting in your desired sort order.
a.sort { |a,b|
ap = a.split('_')
a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
bp = b.split('_')
b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
b <=> a
}
A generic algorithm for sorting strings that contain non-padded sequence numbers at arbitrary positions.
padding = 4
list.sort{|a,b|
a,b = [a,b].map{|s| s.gsub(/\d+/){|m| "0"*(padding - m.size) + m } }
a<=>b
}
where padding is the field length you want the numbers to have during comparison. Any number found in a string will be zero padded before comparison if it consists of less than "padding" number of digits, which yields the expected sorting order.
To yield the result asked for by the user682932, simply add .reverse
after the sort block, which will flip the natural ordering (ascending) into a descending order.
With a pre-loop over the strings you can of course dynamically find the maximum number of digits in the list of strings, which you can use instead of hard-coding some arbitrary padding length, but that would require more processing (slower) and a bit more code. E.g.
padding = list.reduce(0){|max,s|
x = s.scan(/\d+/).map{|m|m.size}.max
(x||0) > max ? x : max
}
Posting here a more general way to perform a natural decimal sort in Ruby. The following is inspired by my code for sorting "like Xcode" from https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/lib/xcodeproj/project/object/helpers/sort_helper.rb, itself loosely inspired by https://rosettacode.org/wiki/Natural_sorting#Ruby.
Even if it's clear that we want "10" to be after "2" for a natural decimal sort, there are other aspects to consider with multiple possible alternative behaviors wanted:
With those considerations:
scan
instead of split
, because we're going to have potentially three kinds of substrings to compare (digits, spaces, all-the-rest).Comparable
class and with def <=>(other)
because it's not possible to simply map
each substring to something else that would have two distinct behaviors depending on context (the first pass and the equality pass).This results in a bit lengthy implementation, but it works nicely for edge situations:
# Wrapper for a string that performs a natural decimal sort (alphanumeric).
# @example
# arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }
class NaturalSortString
include Comparable
attr_reader :str_fallback, :ints_and_strings, :ints_and_strings_fallback, :str_pattern
def initialize(str)
# fallback pass: case is inverted
@str_fallback = str.swapcase
# first pass: digits are used as integers, spaces are compacted, case is ignored
@ints_and_strings = str.scan(/\d+|\s+|[^\d\s]+/).map do |s|
case s
when /\d/ then Integer(s, 10)
when /\s/ then ' '
else s.downcase
end
end
# second pass: digits are inverted, case is inverted
@ints_and_strings_fallback = @str_fallback.scan(/\d+|\D+/).map do |s|
case s
when /\d/ then Integer(s.reverse, 10)
else s
end
end
# comparing patterns
@str_pattern = @ints_and_strings.map { |el| el.is_a?(Integer) ? :i : :s }.join
end
def <=>(other)
if str_pattern.start_with?(other.str_pattern) || other.str_pattern.start_with?(str_pattern)
compare = ints_and_strings <=> other.ints_and_strings
if compare != 0
# we sort naturally (literal ints, spaces simplified, case ignored)
compare
else
# natural equality, we use the fallback sort (int reversed, case swapped)
ints_and_strings_fallback <=> other.ints_and_strings_fallback
end
else
# type mismatch, we sort alphabetically (case swapped)
str_fallback <=> other.str_fallback
end
end
end
Example 1:
arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }
Example 2:
arrayOfFilenames.sort! do |x, y|
NaturalSortString.new(x) <=> NaturalSortString.new(y)
end
You may find my test case at https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/spec/project/object/helpers/sort_helper_spec.rb, where I used this reference for ordering: [ ' a', ' a', '0.1.1', '0.1.01', '0.1.2', '0.1.10', '1', '01', '1a', '2', '2 a', '10', 'a', 'A', 'a ', 'a 2', 'a1', 'A1B001', 'A01B1', ]
Of course, feel free to customize your own sorting logic now.
From the looks of it, you want to use the sort function and/or the reverse function.
ruby-1.9.2-p136 :009 > a = ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"]
=> ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"]
ruby-1.9.2-p136 :010 > a.sort
=> ["abc_1", "abc_11", "abc_2", "abc_22", "abc_3"]
ruby-1.9.2-p136 :011 > a.sort.reverse
=> ["abc_3", "abc_22", "abc_2", "abc_11", "abc_1"]