How to efficiently concatenate multiple arrays in Ruby?

前端 未结 3 1209
无人及你
无人及你 2021-01-02 02:56

I just wanted to concatenate multiple arrays in Ruby and couldn\'t find a satisfying way to do so.

Example input:

foo = [1, 2, 3]
ba         


        
相关标签:
3条回答
  • 2021-01-02 03:11

    Did some benchmarks and simple + is the most efficient. So i would suggest to neglect the intermediate creation of an array.

    You could add a new method concat_all to Array like this, but you would have to take into account mixed and multi-dimensional arrays also.

    class Array
      def concat_all 
        self.reduce([], :+)
      end
    end
    [a, b, c].concat_all # a huge array
    [a, b, c].concat_all.length #300000
    

    Here my benchmarks

    require 'Benchmark'
    N = 1000
    
    class Array
      def concat_all 
        self.reduce([], :+)
      end
      def concat_all2
        # just a quick test with fixed numbers for the fill method Stephan proposes but in Ruby itself
        d = Array.new(300_000)
        d[0..99999] = self[0]
        d[100_000..199999] = self[1]
        d[200_000..299999] = self[2]
        d
      end
      def concat_all3
        self.flatten
      end
    end
    
    # small arrays
    a = (1..10).to_a
    b = (11..20).to_a
    c = (21..30).to_a
    
    Benchmark.bm do |r|
      r.report('plus       ')  { N.times { a + b + c }}
      r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
      r.report('push       ') { N.times { [].push(*a).push(*b).push(*c) }}
      r.report('<<         ') { N.times { ([] << a << b << c).flatten}}
      r.report('splash     ') { N.times {[*a, *b, *c]} }
      r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
      r.report('concat_all3')  { N.times { [a, b, c].concat_all3 }}
      r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
    end
    
    #large arrays
    a = (1..100_000).to_a
    b = (100_001..200_000).to_a
    c = (200_001..300_000).to_a
    
    Benchmark.bm do |r|
      r.report('plus       ')  { N.times { a + b + c }}
      r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
      r.report('push       ') { N.times { [].push(*a).push(*b).push(*c) }}
      r.report('<<         ') { N.times { ([] << a << b << c).flatten}}
      r.report('splash     ') { N.times {[*a, *b, *c]} }
      r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
      r.report('concat_all2')  { N.times { [a, b, c].concat_all2 }}
      r.report('concat_all3')  { N.times { [a, b, c].concat_all3 }}
      r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
    end
    

    And here the results

    # results for small arrays
           user     system      total        real
    plus         0.000000   0.000000   0.000000 (  0.000416)
    concat       0.000000   0.000000   0.000000 (  0.000592)
    push         0.000000   0.000000   0.000000 (  0.000441)
    <<           0.000000   0.000000   0.000000 (  0.003387)
    splash       0.000000   0.000000   0.000000 (  0.000789)
    concat_all   0.000000   0.000000   0.000000 (  0.001480)
    concat_all3  0.016000   0.000000   0.016000 (  0.003496)
    flat_map     0.000000   0.000000   0.000000 (  0.001036)
    
    # results for big arrays
           user     system      total        real
    plus         0.686000   0.671000   1.357000 (  1.351171)
    concat       0.890000   0.733000   1.623000 (  1.630155)
    push         1.466000   0.624000   2.090000 (  2.092684)
    <<          23.837000   1.045000  24.882000 ( 24.885238)
    splash       1.029000   1.264000   2.293000 (  2.332560)
    concat_all   0.687000   0.967000   1.654000 (  1.709321)
    concat_all2  0.936000   0.780000   1.716000 (  1.730428)
    concat_all3 24.242000   0.998000  25.240000 ( 25.278264)
    flat_map     0.780000   0.765000   1.545000 (  1.551654)
    
    0 讨论(0)
  • 2021-01-02 03:15

    I've created another benchmark, comparing +, concat and a custom C extension with a variable number of arrays.

    Result

    • the C extension was always fastest and roughly 2-3x faster than concat
    • plus is getting really slow if you concatenate many arrays

    Conclusion

    Although "2-3x" sounds like a huge improvement, it's just a few milliseconds in absolute terms. I was expecting a bigger difference by not having to resize the array, but this is apparently not a huge factor.

    IMO, concat is a decent performer and I see no urgent need for a C extension.


    My test arrays contain nil values. Other elements don't seem to produce different results (in relative terms).

    I didn't include flat_map, because it is equivalent to concat.

    Concatenating 3 arrays of size 100 (10000 times)
                     user     system      total        real
    plus         0.020000   0.000000   0.020000 (  0.027927)
    concat       0.020000   0.010000   0.030000 (  0.033204)
    c_extension  0.010000   0.010000   0.020000 (  0.010727)
    
    Concatenating 10 arrays of size 100 (10000 times)
                     user     system      total        real
    plus         0.110000   0.070000   0.180000 (  0.180417)
    concat       0.050000   0.020000   0.070000 (  0.065299)
    c_extension  0.010000   0.010000   0.020000 (  0.025475)
    
    Concatenating 10 arrays of size 1000 (10000 times)
                     user     system      total        real
    plus         0.690000   0.560000   1.250000 (  1.252319)
    concat       0.180000   0.130000   0.310000 (  0.303365)
    c_extension  0.120000   0.120000   0.240000 (  0.248589)
    

    plus is excluded from the following results

    Concatenating 10 arrays of size 100000 (100 times)
                     user     system      total        real
    concat       0.220000   0.340000   0.560000 (  0.568730)
    c_extension  0.130000   0.150000   0.280000 (  0.281354)
    
    Concatenating 100 arrays of size 10000 (100 times)
                     user     system      total        real
    concat       0.210000   0.320000   0.530000 (  0.519030)
    c_extension  0.160000   0.140000   0.300000 (  0.304751)
    
    Concatenating 1000 arrays of size 1000 (100 times)
                     user     system      total        real
    concat       0.240000   0.330000   0.570000 (  0.563511)
    c_extension  0.150000   0.120000   0.270000 (  0.283546)
    
    Concatenating 10000 arrays of size 100 (100 times)
                     user     system      total        real
    concat       0.330000   0.310000   0.640000 (  0.643987)
    c_extension  0.170000   0.120000   0.290000 (  0.286489)
    
    Concatenating 100000 arrays of size 10 (100 times)
                     user     system      total        real
    concat       1.300000   0.340000   1.640000 (  1.648687)
    c_extension  0.310000   0.150000   0.460000 (  0.458214)
    

    Test code:

    require 'benchmark'
    
    values = [
      # small
      { count: 3,      size: 100,    n: 10000 },
      { count: 10,     size: 100,    n: 10000 },
      { count: 10,     size: 1000,   n: 10000 },
      # large
      { count: 10,      size: 100000, n: 100 },
      { count: 100,     size: 10000,  n: 100 },
      { count: 1000,    size: 1000,   n: 100 },
      { count: 10000,   size: 100,    n: 100 },
      { count: 100000,  size: 10,     n: 100 }
    ]
    
    values.each_with_index do |h, i|
      count, size, n = h.values_at(:count, :size, :n)
      arrays = Array.new(count) { Array.new(size) }
    
      puts
      puts "Concatenating #{count} arrays of size #{size} (#{n} times)"
      Benchmark.bm(10) do |r|
        r.report('plus')        { n.times { arrays.reduce(:+) } } if i < 3
        r.report('concat')      { n.times { arrays.reduce([], :concat) } }
        r.report('c_extension') { n.times { Array.concat(*arrays) } }
      end
    end
    

    C extension: (a patch actually, I've added this to Ruby's array.c)

    VALUE
    rb_ary_s_concat(int argc, VALUE *argv, VALUE klass)
    {
      VALUE ary;
      long len = 0, i;
      for (i=0; i<argc; i++) {
        argv[i] = to_ary(argv[i]);
        len += RARRAY_LEN(argv[i]);
      }
      ary = rb_ary_new2(len);
      long beg = 0;
      for (i=0; i<argc; i++) {
        ary_memcpy(ary, beg, RARRAY_LEN(argv[i]), RARRAY_CONST_PTR(argv[i]));
        beg += RARRAY_LEN(argv[i]);
      }
      ARY_SET_LEN(ary, len);
      return ary;
    }
    

    You have to register this method in Init_Array via:

    rb_define_singleton_method(rb_cArray, "concat", rb_ary_s_concat, -1);
    
    0 讨论(0)
  • 2021-01-02 03:26

    If you've already determined that multiple concatenation is the fastest method, you can write it nicer using reduce:

    [foo, bar, baz].reduce([], :concat)
    
    0 讨论(0)
提交回复
热议问题