Should I always use a parallel stream when possible?

后端 未结 6 962
走了就别回头了
走了就别回头了 2020-11-22 03:06

With Java 8 and lambdas it\'s easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parall

6条回答
  •  误落风尘
    2020-11-22 03:36

    I watched one of the presentations of Brian Goetz (Java Language Architect & specification lead for Lambda Expressions). He explains in detail the following 4 points to consider before going for parallelization:

    Splitting / decomposition costs
    – Sometimes splitting is more expensive than just doing the work!
    Task dispatch / management costs
    – Can do a lot of work in the time it takes to hand work to another thread.
    Result combination costs
    – Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.
    Locality
    – The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

    He also mentions a relatively simple formula to determine a chance of parallel speedup.

    NQ Model:

    N x Q > 10000
    

    where,
    N = number of data items
    Q = amount of work per item

提交回复
热议问题