问题
While doing some benchmarking to answer this question about the fastest way to concatenate arrays I was surprised that when I did the same benchmarks in with jRuby the tests were a lot slower.
Does this mean that the old adagio about jRuby being faster than MRI Ruby is gone ? Or is this about how arrays are treated in jRuby ?
Here the benchmark and the results in both MRI Ruby 2.3.0 and jRuby 9.1.2.0
Both run on a 64bit Windows 7 box, all 4 processors busy for 50-60%, memory in use ± 5.5GB. The jRuby had to be started with the parameter -J-Xmx1500M
to provide enough heap space. I had to remove the test with push because of stack level too deep and also removed the slowest methods to not make the tests too long. Used Jave runtime: 1.7.0_21
require 'Benchmark'
N = 100
class Array
def concat_all
self.reduce([], :+)
end
end
# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a
Benchmark.bm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a
Benchmark.bm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
This question is not about the different methods used, see the original question for that. In both situations MRI is 7 times faster ! Can someone exlain me why ? I'm also curious to how other implementations do, like RBX (Rubinius)
C:\Users\...>d:\jruby\bin\jruby -J-Xmx1500M concat3.rb
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000946)
concat 0.000000 0.000000 0.000000 ( 0.001436)
splash 0.000000 0.000000 0.000000 ( 0.001456)
concat_all 0.000000 0.000000 0.000000 ( 0.002177)
flat_map 0.010000 0.000000 0.010000 ( 0.003179)
user system total real
plus 140.166000 0.000000 140.166000 (140.158687)
concat 143.475000 0.000000 143.475000 (143.473786)
splash 139.408000 0.000000 139.408000 (139.406671)
concat_all 144.475000 0.000000 144.475000 (144.474436)
flat_map143.519000 0.000000 143.519000 (143.517636)
C:\Users\...>ruby concat3.rb
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000074)
concat 0.000000 0.000000 0.000000 ( 0.000065)
splash 0.000000 0.000000 0.000000 ( 0.000098)
concat_all 0.000000 0.000000 0.000000 ( 0.000141)
flat_map 0.000000 0.000000 0.000000 ( 0.000122)
user system total real
plus 15.226000 6.723000 21.949000 ( 21.958854)
concat 11.700000 9.142000 20.842000 ( 20.928087)
splash 21.247000 12.589000 33.836000 ( 33.933170)
concat_all 14.508000 8.315000 22.823000 ( 22.871641)
flat_map 11.170000 8.923000 20.093000 ( 20.170945)
回答1:
general rule is (as mentioned in the comments) that JRuby/JVM needs warmup.
usually bmbm
is good fit, although TIMES=1000
should be increased (at least for the small array cases), also 1.5G might be not enough for optimal performance of JRuby (noticed a considerable change in numbers going from -Xmx2g to -Xmx3g). here's the results :
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
$ ruby concat3.rb
Rehearsal -----------------------------------------------
plus 0.000000 0.000000 0.000000 ( 0.000076)
concat 0.000000 0.000000 0.000000 ( 0.000070)
splash 0.000000 0.000000 0.000000 ( 0.000099)
concat_all 0.000000 0.000000 0.000000 ( 0.000136)
flat_map 0.000000 0.000000 0.000000 ( 0.000138)
-------------------------------------- total: 0.000000sec
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000051)
concat 0.000000 0.000000 0.000000 ( 0.000059)
splash 0.000000 0.000000 0.000000 ( 0.000083)
concat_all 0.000000 0.000000 0.000000 ( 0.000120)
flat_map 0.000000 0.000000 0.000000 ( 0.000173)
Rehearsal -----------------------------------------------
plus 43.040000 3.320000 46.360000 ( 46.351004)
concat 15.080000 3.870000 18.950000 ( 19.228059)
splash 49.680000 4.820000 54.500000 ( 54.587707)
concat_all 51.840000 5.260000 57.100000 ( 57.114867)
flat_map 17.380000 5.340000 22.720000 ( 22.716987)
------------------------------------ total: 199.630000sec
user system total real
plus 42.880000 3.600000 46.480000 ( 46.506013)
concat 17.230000 5.290000 22.520000 ( 22.890809)
splash 60.300000 7.480000 67.780000 ( 67.878534)
concat_all 54.910000 6.480000 61.390000 ( 61.404383)
flat_map 17.310000 5.570000 22.880000 ( 23.223789)
...
jruby 9.1.6.0 (2.3.1) 2016-11-09 0150a76 Java HotSpot(TM) 64-Bit Server VM 25.112-b15 on 1.8.0_112-b15 +jit [linux-x86_64]
$ jruby -J-Xmx3g concat3.rb
Rehearsal -----------------------------------------------
plus 0.010000 0.000000 0.010000 ( 0.001445)
concat 0.000000 0.000000 0.000000 ( 0.002534)
splash 0.000000 0.000000 0.000000 ( 0.001791)
concat_all 0.000000 0.000000 0.000000 ( 0.002513)
flat_map 0.010000 0.000000 0.010000 ( 0.007088)
-------------------------------------- total: 0.020000sec
user system total real
plus 0.010000 0.000000 0.010000 ( 0.002700)
concat 0.000000 0.000000 0.000000 ( 0.001085)
splash 0.000000 0.000000 0.000000 ( 0.001569)
concat_all 0.000000 0.000000 0.000000 ( 0.003052)
flat_map 0.000000 0.000000 0.000000 ( 0.002252)
Rehearsal -----------------------------------------------
plus 32.410000 0.670000 33.080000 ( 17.385688)
concat 18.610000 0.060000 18.670000 ( 11.206419)
splash 57.770000 0.330000 58.100000 ( 25.366032)
concat_all 19.100000 0.030000 19.130000 ( 13.747319)
flat_map 16.160000 0.040000 16.200000 ( 10.534130)
------------------------------------ total: 145.180000sec
user system total real
plus 16.060000 0.040000 16.100000 ( 11.737483)
concat 15.950000 0.030000 15.980000 ( 10.480468)
splash 47.870000 0.130000 48.000000 ( 22.668069)
concat_all 19.150000 0.030000 19.180000 ( 13.934314)
flat_map 16.850000 0.020000 16.870000 ( 10.862716)
... so it seems like the opposite - MRI 2.3 gets 2-5x slower than JRuby 9.1
cat concat3.rb
require 'benchmark'
N = (ENV['TIMES'] || 100).to_i
class Array
def concat_all
self.reduce([], :+)
end
end
# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a
Benchmark.bmbm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a
Benchmark.bmbm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
回答2:
What I have learned from these comments and answers and the tests I did myself afterward..
- the OS probably makes a difference, I would have liked more answers in different situations so here I'm just guessing
- the fastest method differs between runtime, MRI or jRuby, 32 of 64bit, JRE, so making claims that that method is beter than that other one is difficult, on my sysrtem the plus method was fastest in almost all circumstances but I didin't use Java HotSpot like kares
- in 64 bit jRuby you can specify a much higher heap than in 32 bit (1.5G on my system), in 64 bit I coult use more heap than I have memory (a bug somewhere ?)
- higher heaps speed up operations using much memory like the huge arrays I used
- use the latest Java runtime, speed is better
- jRuby needs a warmup, a methods needs to run a number of times before compiled, so use .bm and .bmbm with different repeat values to find that margin
- Sometimes MRI is faster but with the right parameters and warmup jRuby was 3 to 3.5 times as fast on my system for this particular test
The last, together with the loading of the JVM makes MRI better for short ad hoc scripts, jRuby better for process hungry, longer running processes with methods repeated often, so jRuby would be better for running servers and services.
What I saw confirmed: do your own benchmarks for long or repeated processes. Both implementations have made big improvements in speed compared to earlier versions, let's not forget: Ruby may be a slower runner but a faster developer and if you compare the cost of some extra hardware to some extra developers...
Thanks to all the commenters and karen for their expertise.
EDIT
Out of curiosity I run the test also with Rubinius in a docker container (I'm on Windows), rubinius 3.69 (2.3.1 a57071c6 2016-11-17 3.8.0) [x86_64-linux-gnu]
Only concat and flat_map are on par with MRI, I wonder if these methods are in C and the rest in pure Ruby..
Rehearsal -----------------------------------------------
plus 0.000000 0.000000 0.000000 ( 0.000742)
concat 0.000000 0.000000 0.000000 ( 0.000093)
splash 0.000000 0.000000 0.000000 ( 0.000619)
concat_all 0.000000 0.000000 0.000000 ( 0.001357)
flat_map 0.000000 0.000000 0.000000 ( 0.001536)
-------------------------------------- total: 0.000000sec
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000589)
concat 0.000000 0.000000 0.000000 ( 0.000084)
splash 0.000000 0.000000 0.000000 ( 0.000596)
concat_all 0.000000 0.000000 0.000000 ( 0.001679)
flat_map 0.000000 0.000000 0.000000 ( 0.001568)
Rehearsal -----------------------------------------------
plus 68.770000 63.320000 132.090000 (265.589506)
concat 20.300000 2.810000 23.110000 ( 23.662007)
splash 79.310000 74.090000 153.400000 (305.013934)
concat_all 83.130000 100.580000 183.710000 (378.988638)
flat_map 20.680000 0.960000 21.640000 ( 21.769550)
------------------------------------ total: 513.950000sec
user system total real
plus 65.310000 70.300000 135.610000 (273.799215)
concat 20.050000 0.610000 20.660000 ( 21.163930)
splash 79.360000 80.000000 159.360000 (316.366122)
concat_all 84.980000 99.880000 184.860000 (383.870653)
flat_map 20.940000 1.760000 22.700000 ( 22.760643)
来源:https://stackoverflow.com/questions/40529208/performance-difference-between-mri-ruby-and-jruby