Complexity of grouping in Java8

后端 未结 1 453
慢半拍i
慢半拍i 2021-01-04 10:38

I would like to learn the time complexity of the given statement below.(In Java8)

list.stream().collect(groupingBy(...)); 

Any idea?

相关标签:
1条回答
  • 2021-01-04 11:02

    There is no general answer to that question, as the time complexity depends on all operations. Since the stream has to be processed entirely, there is a base time complexity of O(n) that has to be multiplied by the costs of all operations done per element. This, assuming that the iteration costs itself are not worse than O(n), which is the case for most stream sources.

    So, assuming no intermediate operations that affect the time complexity, the groupingBy has to evaluate the function for each element, which should be independent of other elements, so not affect the time complexity (regardless of how expensive it is, as the O(…) time complexity only tells us, how the time scales with large numbers of stream elements). Then, it will insert the element into a map, which might depend on the number of already contained elements. Without a custom Map supplier, the map’s type is unspecified, hence, no statement can be made here.

    In practice, it’s reasonable to assume that the result will be some sort of hashing map with a net O(1) lookup complexity by default. So we have a net time complexity of O(n) for the grouping. Then, we have the downstream collector.

    The default downstream collector is toList(), which produces an unspecified List type, so again, we can’t say anything about the costs of adding elements to it.

    The current implementation produces an ArrayList, which has to perform copy operations when the capacity is exceeded, but since the capacity is raised by a factor each time, there is still a net complexity of O(n) for adding n elements. It’s reasonable to assume that future changes to the toList() implementation won’t make the costs worse than what we have today. So the time complexity of a default groupingBy collection is likely O(n).

    If we use a custom Map collector with a custom downstream collector, the complexity depends on the average number of groups to number of elements per group ratio. The worst case would be the worst of either, the map’s lookup and the downstream collector’s element processing (times the number of elements), as we could have one group containing all items or each item being in its own group.

    But usually, you are capable of predicting a bias for a particular grouping operation, so you would want to calculate a time complexity for that particular operation, instead of relying on a statement about all grouping operations in general.

    0 讨论(0)
提交回复
热议问题