I would like to learn the time complexity of the given statement below.(In Java8)
list.stream().collect(groupingBy(...));
Any idea?
There is no general answer to that question, as the time complexity depends on all operations. Since the stream has to be processed entirely, there is a base time complexity of O(n)
that has to be multiplied by the costs of all operations done per element. This, assuming that the iteration costs itself are not worse than O(n)
, which is the case for most stream sources.
So, assuming no intermediate operations that affect the time complexity, the groupingBy
has to evaluate the function for each element, which should be independent of other elements, so not affect the time complexity (regardless of how expensive it is, as the O(…)
time complexity only tells us, how the time scales with large numbers of stream elements). Then, it will insert the element into a map, which might depend on the number of already contained elements. Without a custom Map
supplier, the map’s type is unspecified, hence, no statement can be made here.
In practice, it’s reasonable to assume that the result will be some sort of hashing map with a net O(1)
lookup complexity by default. So we have a net time complexity of O(n)
for the grouping. Then, we have the downstream collector.
The default downstream collector is toList()
, which produces an unspecified List
type, so again, we can’t say anything about the costs of adding elements to it.
The current implementation produces an ArrayList
, which has to perform copy operations when the capacity is exceeded, but since the capacity is raised by a factor each time, there is still a net complexity of O(n)
for adding n elements. It’s reasonable to assume that future changes to the toList()
implementation won’t make the costs worse than what we have today. So the time complexity of a default groupingBy
collection is likely O(n)
.
If we use a custom Map
collector with a custom downstream collector, the complexity depends on the average number of groups to number of elements per group ratio. The worst case would be the worst of either, the map’s lookup and the downstream collector’s element processing (times the number of elements), as we could have one group containing all items or each item being in its own group.
But usually, you are capable of predicting a bias for a particular grouping operation, so you would want to calculate a time complexity for that particular operation, instead of relying on a statement about all grouping operations in general.