Ruby - Array.join versus String Concatenation (Efficiency)

后端未结

关注

 5  1803

I recall getting a scolding for concatenating Strings in Python once upon a time. I was told that it is more efficient to create an List of Strings in Python and join them later

相关标签:

5条回答

失恋的感觉

2021-02-05 07:10

Yes, it's the same principle. I remember a ProjectEuler puzzle where I tried it both ways, calling join is much faster.

If you check out the Ruby source, join is implemented all in C, it's going to be a lot faster than concatenating strings (no intermediate object creation, no garbage collection):

/*
 *  call-seq:
 *     array.join(sep=$,)    -> str
 *  
 *  Returns a string created by converting each element of the array to
 *  a string, separated by <i>sep</i>.
 *     
 *     [ "a", "b", "c" ].join        #=> "abc"
 *     [ "a", "b", "c" ].join("-")   #=> "a-b-c"
 */

static VALUE
rb_ary_join_m(argc, argv, ary)
    int argc;
    VALUE *argv;
    VALUE ary;
{
    VALUE sep;

    rb_scan_args(argc, argv, "01", &sep);
    if (NIL_P(sep)) sep = rb_output_fs;

    return rb_ary_join(ary, sep);
}

where rb_ary_join is:

 VALUE rb_ary_join(ary, sep)
     VALUE ary, sep;
 {
     long len = 1, i;
     int taint = Qfalse;
     VALUE result, tmp;

     if (RARRAY(ary)->len == 0) return rb_str_new(0, 0);
     if (OBJ_TAINTED(ary) || OBJ_TAINTED(sep)) taint = Qtrue;

     for (i=0; i<RARRAY(ary)->len; i++) {
     tmp = rb_check_string_type(RARRAY(ary)->ptr[i]);
     len += NIL_P(tmp) ? 10 : RSTRING(tmp)->len;
     }
     if (!NIL_P(sep)) {
     StringValue(sep);
     len += RSTRING(sep)->len * (RARRAY(ary)->len - 1);
     }
     result = rb_str_buf_new(len);
     for (i=0; i<RARRAY(ary)->len; i++) {
     tmp = RARRAY(ary)->ptr[i];
     switch (TYPE(tmp)) {
       case T_STRING:
         break;
       case T_ARRAY:
         if (tmp == ary || rb_inspecting_p(tmp)) {
         tmp = rb_str_new2("[...]");
         }
         else {
         VALUE args[2];

         args[0] = tmp;
         args[1] = sep;
         tmp = rb_protect_inspect(inspect_join, ary, (VALUE)args);
         }
         break;
       default:
         tmp = rb_obj_as_string(tmp);
     }
     if (i > 0 && !NIL_P(sep))
         rb_str_buf_append(result, sep);
     rb_str_buf_append(result, tmp);
     if (OBJ_TAINTED(tmp)) taint = Qtrue;
     }

     if (taint) OBJ_TAINT(result);
     return result;
}

0 讨论(0)

盖世英雄少女心

2021-02-05 07:17

Funny, benchmarking gives surprising results (unless I'm doing something wrong):

require 'benchmark'

N = 1_000_000
Benchmark.bm(20) do |rep|

  rep.report('+') do
    N.times do
      res = 'foo' + 'bar' + 'baz'
    end
  end

  rep.report('join') do
    N.times do
      res = ['foo', 'bar', 'baz'].join
    end
  end

  rep.report('<<') do
    N.times do
      res = 'foo' << 'bar' << 'baz'
    end
  end
end

gives

jablan@poneti:~/dev/rb$ ruby concat.rb 
                          user     system      total        real
+                     1.760000   0.000000   1.760000 (  1.791334)
join                  2.410000   0.000000   2.410000 (  2.412974)
<<                    1.380000   0.000000   1.380000 (  1.376663)

join turns out to be the slowest. It might have to do with creating the array, but that's what you would have to do anyway.

Oh BTW,

jablan@poneti:~/dev/rb$ ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i486-linux]

0 讨论(0)

傲寒

2021-02-05 07:20

Try it yourself with the Benchmark class.

require "benchmark"

n = 1000000
Benchmark.bmbm do |x|
  x.report("concatenation") do
    foo = ""
    n.times do
      foo << "foobar"
    end
  end

  x.report("using lists") do
    foo = []
    n.times do
      foo << "foobar"
    end
    string = foo.join
  end
end

This produces the following output:

Rehearsal -------------------------------------------------
concatenation   0.300000   0.010000   0.310000 (  0.317457)
using lists     0.380000   0.050000   0.430000 (  0.442691)
---------------------------------------- total: 0.740000sec

                    user     system      total        real
concatenation   0.260000   0.010000   0.270000 (  0.309520)
using lists     0.310000   0.020000   0.330000 (  0.363102)

So it looks like concatenation is a little faster in this case. Benchmark on your system for your use-case.

0 讨论(0)

我在风中等你

2021-02-05 07:25

I was just reading about this. Attahced is a link talking about it.

Building-a-String-from-Parts

From what I understand, in Python and Java strings are immutable objects unlike arrays, while in Ruby both strings and arrays are as mutable as each other. There might be a minimal difference in speed between using a String.concat or << method to form a string versus Array.join but it doesn't seem to be a big issue.

I think the link will explain this a lot better than i did.

Thanks,

Martin

0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2021-02-05 07:32

" The problem is the pile of data as a whole. In his first situation, he had two types of data stockpiling: (1) a temporary string for each row in his CSV file, with fixed quotations and such things, and (2) the giant string containing everything. If each string is 1k and there are 5,000 rows...

Scenario One: build a big string from little strings

temporary strings: 5 megs (5,000k) massive string: 5 megs (5,000k) TOTAL: 10 megs (10,000k) Dave's improved script swapped the massive string for an array. He kept the temporary strings, but stored them in an array. The array will only end up costing 5000 * sizeof(VALUE) rather than the full size of each string. And generally, a VALUE is four bytes.

Scenario Two: storing strings in an array

strings: 5 megs (5,000k) massive array: 20k

Then, when we need to make a big string, we call join. Now we're up to ten megs and suddenly all those strings become temporary strings and they can all be released at once. It's a huge cost at the end, but it's a lot more efficient than a gradual crescendo that eats resources the whole time. "

http://viewsourcecode.org/why/hacking/theFullyUpturnedBin.html

^It's actually better to in the for memory/garbage collection performance to delay the operation until the end just like I was taught to in Python. The reason begin that you get one huge chunk of allocation towards the end and an instant release of objects.

0 讨论(0)
发布评论:

提交评论
- 加载中...