I ran a set of performance benchmarks for 10,000,000 elements, and I\'ve discovered that the results vary greatly with each implementation.
Can anybody explain why c
Oh So Many Things going on here!!!
Let's start with Java int[]
. Arrays in Java are the only collection that is not type erased. The run time representation of an int[]
is different from the run time representation of Object[]
, in that it actually uses int
directly. Because of that, there's no boxing involved in using it.
In memory terms, you have 40.000.000 consecutive bytes in memory, that are read and written 4 at a time whenever an element is read or written to.
In contrast, an ArrayList<Integer>
-- as well as pretty much any other generic collection -- is composed of 40.000.000 or 80.000.00 consecutive bytes (on 32 and 64 bits JVM respectively), PLUS 80.000.000 bytes spread all around memory in groups of 8 bytes. Every read an write to an element has to go through two memory spaces, and the sheer time spent handling all that memory is significant when the actual task you are doing is so fast.
So, back to Scala, for the second example where you manipulate a List
. Now, Scala's List
is much more like Java's LinkedList
than the grossly misnamed ArrayList
. Each element of a List
is composed of an object called Cons
, which has 16 bytes, with a pointer to the element and a pointer to another list. So, a List
of 10.000.000 elements is composed of 160.000.000 elements spread all around memory in groups of 16 bytes, plus 80.000.000 bytes spread all around memory in groups of 8 bytes. So what was true for ArrayList
is even more so for List
Finally, Range
. A Range
is a sequence of integers with a lower and an upper boundary, plus a step. A Range
of 10.000.000 elements is 40 bytes: three ints (not generic) for lower and upper bounds and step, plus a few pre-computed values (last
, numRangeElements
) and two other ints used for lazy val
thread safety. Just to make clear, that's NOT 40 times 10.000.000: that's 40 bytes TOTAL. The size of the range is completely irrelevant, because IT DOESN'T STORE THE INDIVIDUAL ELEMENTS. Just the lower bound, upper bound and step.
Now, because a Range
is a Seq[Int]
, it still has to go through boxing for most uses: an int
will be converted into an Integer
and then back into an int
again, which is sadly wasteful.
Cons Size Calculation
So, here's a tentative calculation of Cons. First of all, read this article about some general guidelines on how much memory an object takes. The important points are:
I actually thought it was 16 bytes, not 8. Anyway, Cons is also smaller than I thought. Its fields are:
public static final long serialVersionUID; // static, doesn't count
private java.lang.Object scala$collection$immutable$$colon$colon$$hd;
private scala.collection.immutable.List tl;
References are at least 4 bytes (could be more on 64 bits JVM). So we have:
8 bytes Java header
4 bytes hd
4 bytes tl
Which makes it only 16 bytes long. Pretty good, actually. In the example, hd
will point to an Integer
object, which I assume is 8 bytes long. As for tl
, it points to another cons, which we are already counting.
I'm going to revise the estimates, with actual data where possible.
This is an educated guess ...
I think it is because in the fast version the Scala compiler is able to translate the key statement into something like this (in Java):
List<Integer> millions = new ArrayList<Integer>();
for (int i = 0; i <= 10000000; i++) {
if (i % 1000000 == 0) {
millions.add(i);
}
}
As you can see, (0 to 10000000)
doesn't generate an intermediate list of 10,000,000 Integer
objects.
By contrast, in the slow version the Scala compiler is not able to do that optimization, and is generating that list.
(The intermediate data structure could possibly be an int[]
, but the observed JVM size suggests that it is not.)
It's hard to read the Scala source on my iPad, but it looks like Range
's constructor isn't actually producing a list, just remembering the start, increment and end. It uses these to produce its values on request, so that iterating over a range is a lot closer to a simple for loop than examining the elements of an array.
As soon as you say range.toList
you are forcing Scala to produce a linked list of the 'values' in the range (allocating memory for both the values and the links), and then you are iterating over that. Being a linked list the performance of this is going to be worse than your Java ArrayList example.
In the first example you create a linked list with 10 elements by computing the steps of the range.
In the second example you create a linked list with 10 millions of elements and filter it down to a new linked list with 10 elements.
In the third example you create an array-backed buffer with 10 millions of elements which you traverse and print, no new array-backed buffer is created.
Conclusion:
Every piece of code does something different, that's why the performance varies greatly.