Java 8 Streams: count all elements which enter the terminal operation

泪湿孤枕 提交于 2021-02-04 16:53:06

问题


I wonder whether there is a nicer (or just an other) approach to get the count of all items that enter the terminal operation of a stream instead of the following:

Stream<T> stream = ... // given as parameter
AtomicLong count = new AtomicLong();
stream.filter(...).map(...)
      .peek(t -> count.incrementAndGet())

where count.get() gives me the actual count of the processed items at that stage.

I deliberately skipped the terminal operation as that might change between .forEach, .reduce or .collect. I do know .count already, but it seems to work well only if I exchange a .forEach with a .map and use the .count as terminal operation instead. But it seems to me as if .map is then misused.

What I don't really like with the above solution: if a filter is added after it, it just counts the elements at that specific stage, but not the ones that are going into the terminal operation.

The other approach that comes to my mind is to collect the filtered and mapped values into a list and operate on that and just call list.size() to get the count. However this will not work, if the collection of the stream would lead to an error, whereas with the above solution I could have a count for all processed items so far, if an appropriate try/catch is in place. That however isn't a hard requirement.


回答1:


It seems you already have the cleanest solution via peek before the terminal operation IMO. The only reason I could think that this is needed is for debug purposes - and if that is the case, than peek was designed for that. Wrapping the Stream for that and providing separate implementations is way too much - besides the huge amount of time and later support for everything that get's added to Streams.

For the part of what if there is another filter added? Well, provide a code comment(lots of us do that) and a few test cases that would otherwise fail for example.


Just my 0.02$




回答2:


The best idea that is possible is using a mapping on itself and while doing so counting the invocation of the mapping routine.

steam.map(object -> {counter.incrementAndGet(); return object;});

Since this lambda can be reused and you can replace any lambda with an object you can create a counter object like this:

class StreamCounter<T> implements Function<? super T,? extends T> {
  int counter = 0;
  public T apply(T object) { counter++; return object;}
  public int get() { return counter;}
}

So using:

StreamCounter<String> myCounter = new ...;
stream.map(myCounter)...
int count = myCounter.get();

Since again the map invocation is just another point of reuse the map method can be provided by extending Stream and wrap the ordinary stream.

This way you can create something like:

AtomicLong myValue = new AtomicLong();
...
convert(stream).measure(myValue).map(...).measure(mySecondValue).filter(...).measure(myThirdValue).toList(...);

This way you can simply have your own Stream wrapper that wraps transparently every stream in its own version (which is no performance or memory overhead) and measure the cardinality of any such point of measure.

This is often done when analyzing complexity of algorithms when creating map/reduce solutions. Extending your stream implementation by not taking a atomic long instance for counting but only the name of the measure point your stream implementation can hold unlimited number of measure points while providing a flexible way to print a report.

Such an implementation can remember the concrete sequence of stream methods along with the position of each measure point and brings outputs like:

list ->  (32k)map -> (32k)filter -> (5k)map -> avg(). 

Such a stream implementation is written once, can be used for testing but also for reporting.

Build in into an every day implementation gives the possibility to gather statistics for certain processing and allow for a dynamic optimization by using a different permutation of operations. This would be for example a query optimizer.

So in your case the best would be reusing a StreamCounter first and depending on the frequency of use, the number of counters and the affinity for the DRY-principle eventually implement a more sophisticated solution later on.

PS: StreamCounter uses an int value and is not thread-safe so in a parallel stream setup one would replace the int with an AtomicInteger instance.



来源:https://stackoverflow.com/questions/43653761/java-8-streams-count-all-elements-which-enter-the-terminal-operation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!