scala ranges versus lists performance on large collections

前端 未结 4 1949
醉梦人生
醉梦人生 2021-01-01 04:06

I ran a set of performance benchmarks for 10,000,000 elements, and I\'ve discovered that the results vary greatly with each implementation.

Can anybody explain why c

相关标签:
4条回答
  • 2021-01-01 04:38

    Oh So Many Things going on here!!!

    Let's start with Java int[]. Arrays in Java are the only collection that is not type erased. The run time representation of an int[] is different from the run time representation of Object[], in that it actually uses int directly. Because of that, there's no boxing involved in using it.

    In memory terms, you have 40.000.000 consecutive bytes in memory, that are read and written 4 at a time whenever an element is read or written to.

    In contrast, an ArrayList<Integer> -- as well as pretty much any other generic collection -- is composed of 40.000.000 or 80.000.00 consecutive bytes (on 32 and 64 bits JVM respectively), PLUS 80.000.000 bytes spread all around memory in groups of 8 bytes. Every read an write to an element has to go through two memory spaces, and the sheer time spent handling all that memory is significant when the actual task you are doing is so fast.

    So, back to Scala, for the second example where you manipulate a List. Now, Scala's List is much more like Java's LinkedList than the grossly misnamed ArrayList. Each element of a List is composed of an object called Cons, which has 16 bytes, with a pointer to the element and a pointer to another list. So, a List of 10.000.000 elements is composed of 160.000.000 elements spread all around memory in groups of 16 bytes, plus 80.000.000 bytes spread all around memory in groups of 8 bytes. So what was true for ArrayList is even more so for List

    Finally, Range. A Range is a sequence of integers with a lower and an upper boundary, plus a step. A Range of 10.000.000 elements is 40 bytes: three ints (not generic) for lower and upper bounds and step, plus a few pre-computed values (last, numRangeElements) and two other ints used for lazy val thread safety. Just to make clear, that's NOT 40 times 10.000.000: that's 40 bytes TOTAL. The size of the range is completely irrelevant, because IT DOESN'T STORE THE INDIVIDUAL ELEMENTS. Just the lower bound, upper bound and step.

    Now, because a Range is a Seq[Int], it still has to go through boxing for most uses: an int will be converted into an Integer and then back into an int again, which is sadly wasteful.

    Cons Size Calculation

    So, here's a tentative calculation of Cons. First of all, read this article about some general guidelines on how much memory an object takes. The important points are:

    1. Java uses 8 bytes for normal objects, and 12 for object arrays, for "housekeeping" information (what's the class of this object, etc).
    2. Objects are allocated in 8 bytes chunks. If your object is smaller than that, it will be padded to complement it.

    I actually thought it was 16 bytes, not 8. Anyway, Cons is also smaller than I thought. Its fields are:

    public static final long serialVersionUID; // static, doesn't count
    private java.lang.Object scala$collection$immutable$$colon$colon$$hd;
    private scala.collection.immutable.List tl;
    

    References are at least 4 bytes (could be more on 64 bits JVM). So we have:

    8 bytes Java header
    4 bytes hd
    4 bytes tl
    

    Which makes it only 16 bytes long. Pretty good, actually. In the example, hd will point to an Integer object, which I assume is 8 bytes long. As for tl, it points to another cons, which we are already counting.

    I'm going to revise the estimates, with actual data where possible.

    0 讨论(0)
  • 2021-01-01 04:43

    This is an educated guess ...

    I think it is because in the fast version the Scala compiler is able to translate the key statement into something like this (in Java):

    List<Integer> millions = new ArrayList<Integer>();
    for (int i = 0; i <= 10000000; i++) {
        if (i % 1000000 == 0) {
            millions.add(i);
        }
    }
    

    As you can see, (0 to 10000000) doesn't generate an intermediate list of 10,000,000 Integer objects.

    By contrast, in the slow version the Scala compiler is not able to do that optimization, and is generating that list.

    (The intermediate data structure could possibly be an int[], but the observed JVM size suggests that it is not.)

    0 讨论(0)
  • 2021-01-01 04:47

    It's hard to read the Scala source on my iPad, but it looks like Range's constructor isn't actually producing a list, just remembering the start, increment and end. It uses these to produce its values on request, so that iterating over a range is a lot closer to a simple for loop than examining the elements of an array.

    As soon as you say range.toList you are forcing Scala to produce a linked list of the 'values' in the range (allocating memory for both the values and the links), and then you are iterating over that. Being a linked list the performance of this is going to be worse than your Java ArrayList example.

    0 讨论(0)
  • 2021-01-01 04:55

    In the first example you create a linked list with 10 elements by computing the steps of the range.

    In the second example you create a linked list with 10 millions of elements and filter it down to a new linked list with 10 elements.

    In the third example you create an array-backed buffer with 10 millions of elements which you traverse and print, no new array-backed buffer is created.

    Conclusion:

    Every piece of code does something different, that's why the performance varies greatly.

    0 讨论(0)
提交回复
热议问题