Scala performance question

后端 未结 7 1231
Happy的楠姐
Happy的楠姐 2021-02-03 11:45

In the article written by Daniel Korzekwa, he said that the performance of following code:

list.map(e => e*2).filter(e => e>10)

is muc

相关标签:
7条回答
  • 2021-02-03 11:52

    To avoid traversing the list twice, I think the for syntax is a nice option here:

    val list2 = for(v <- list1; e = v * 2; if e > 10) yield e
    
    0 讨论(0)
  • 2021-02-03 11:54

    The solution lies mostly with JVM. Though Scala has a workaround in the figure of @specialization, that increases the size of any specialized class hugely, and only solves half the problem -- the other half being the creation of temporary objects.

    The JVM actually does a good job optimizing a lot of it, or the performance would be even more terrible, but Java does not require the optimizations that Scala does, so JVM does not provide them. I expect that to change to some extent with the introduction of SAM not-real-closures in Java.

    But, in the end, it comes down to balancing the needs. The same while loop that Java and Scala do so much faster than Scala's function equivalent can be done faster yet in C. Yet, despite what the microbenchmarks tell us, people use Java.

    0 讨论(0)
  • 2021-02-03 11:54

    Rex Kerr correctly states the major problem: Operating on immutable lists, the stated piece of code creates intermediate lists in memory. Note that this is not necessarily slower than equivalent Java code; you just never use immutable datastructures in Java.

    Wilfried Springer has a nice, Scala idomatic solution. Using view, no (manipulated) copies of the whole list are created.

    Note that using view might not always be ideal. For example, if your first call is filter that is expected to throw away most of the list, is might be worthwhile to create the shorter version explicitly and use view only after that in order to improve memory locality for later iterations.

    0 讨论(0)
  • 2021-02-03 12:01

    list.filter(e => e*2>10).map(e => e*2)

    This attempt reduces first the List. So the second traversing is on less elements.

    0 讨论(0)
  • 2021-02-03 12:04

    The reason that particular code is slow is because it's working on primitives but it's using generic operations, so the primitives have to be boxed. (This could be improved if List and its ancestors were specialized.) This will probably slow things down by a factor of 5 or so.

    Also, algorithmically, those operations are somewhat expensive, because you make a whole list, and then make a whole new list throwing a few components of the intermediate list away. If you did it in one swoop, then you'd be better off. You could do something like:

    list collect (case e if (e*2>10) => e*2)
    

    but what if the calculation e*2 is really expensive? Then you could

    (List[Int]() /: list)((ls,e) => { val x = e*2; if (x>10) x :: ls else ls }
    

    except that this would appear backwards. (You could reverse it if need be, but that requires creating a new list, which again isn't ideal algorithmically.)

    Of course, you have the same sort of algorithmic problems in Java if you're using a singly linked list--your new list will end up backwards, or you have to create it twice, first in reverse and then forwards, or you have to build it with (non-tail) recursion (which is easy in Scala, but inadvisable for this sort of thing in either language since you'll exhaust the stack), or you have to create a mutable list and then pretend afterwards that it's not mutable. (Which, incidentally, you can do in Scala also--see mutable.LinkedList.)

    0 讨论(0)
  • 2021-02-03 12:04

    Scala approach is much more abstract and generic. Therefore it is hard to optimize every single case.

    I could imagine that HotSpot JIT compiler might apply stream- and loop-fusion to the code in the future if it sees that the immediate results are not used.

    Additionally the Java code just does much more.

    If you really just want to mutate over a datastructure, consider transform. It looks a bit like map but doesn't create a new collection, e. g.:

    val array = Array(1,2,3,4,5,6,7,8,9,10).transform(_ * 2)
    // array is now WrappedArray(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
    

    I really hope some additional in-place operations will be added ion the future...

    0 讨论(0)
提交回复
热议问题