I have recently started using collectingAndThen and found that it is taking a bit long time comparatively to the other coding procedures, which i used for performing the similar tasks.
Here is my code:
System.out.println("CollectingAndThen");
Long t = System.currentTimeMillis();
String personWithMaxAge = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.maxBy(Comparator.comparing(Person::getAge)),
(Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
));
System.out.println("personWithMaxAge - "+personWithMaxAge + " time taken = "+(System.currentTimeMillis() - t));
Long t2 = System.currentTimeMillis();
String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
.findFirst().get().name;
System.out.println("personWithMaxAge2 : "+personWithMaxAge2+ " time taken = "+(System.currentTimeMillis() - t2));
And here it the output:
CollectingAndThen
personWithMaxAge - Peter time taken = 17
personWithMaxAge2 : Peter time taken = 1
which is showing that collectingAndThen is taking a greater time comparatively.
So my question is - Should I keep on collectingAndThen or there is some other suggestions?
collectingAndThen
adds a single action that is just performed at the end of the collection.
So
String personWithMaxAge = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.maxBy(Comparator.comparing(Person::getAge)),
(Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
));
is not different to
Optional<Person> p = persons.stream()
.collect(Collectors.maxBy(Comparator.comparing(Person::getAge)));
String personWithMaxAge = p.isPresent() ? p.get().getName() : "none";
The actual advantage of specifying the action within a collector shows when you use the resulting collector as input to another collect, e.g. groupingBy(f1, collectingAndThen(collector, f2))
.
Since this is a single trivial action performed exactly once at the end, there is no impact on the performance. Further, it’s unlikely that a sorted
based solution is faster than a maxBy
operation for any nontrivial input.
You are simply using a broken benchmark violating several of the rules listed in “How do I write a correct micro-benchmark in Java?”. Most notably, you are measuring the initial initialization overhead of the first usage of the Stream framework. Just swapping the order of the two operations will give you an entirely different result.
Still, there is no need to make an operation unnecessarily complicated. As said, the advantage of collectors mirroring existing Stream operations is that they can be combined with other collectors. If no such combining is need, just use the straight-forward code
String personWithMaxAge = persons.stream()
.max(Comparator.comparing(Person::getAge))
.map(Person::getName).orElse("none");
which is simpler than the Collector usage and more efficient than the sorted
based solution.
TL;DR; The way you measure things is probably off.
I created a much more valid performance bench of your test, using JMH, set up as the following (the persons list should be initialized differently to further enhance the confidence in the results) :
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(SO.class.getSimpleName()).jvmArgs("-Dfile.encoding=UTF-8").build();
new Runner(opt).run();
}
public static final class Person {
public Person(String name, int age) {
this.name = name;
this.age = age;
}
String name;
int age;
public int getAge() {return age;};
public String getName() {return this.name;}
}
public static final List<Person> persons = IntStream.range(0, 100).mapToObj(i -> new Person("person" + i, RandomUtils.nextInt(0, 50))).collect(Collectors.toList());
@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float collectingAndThen() {
String personWithMaxAge = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.maxBy(Comparator.comparing(Person::getAge)),
(Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
));
return personWithMaxAge.length() * 1.0f;
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float sortFindFirst() {
String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
.findFirst().get().name;
return personWithMaxAge2.length() * 1.0f;
}
The results are beyond doubt on this test (with 100 persons in the list) :
# Run complete. Total time: 00:02:41
Benchmark Mode Cnt Score Error Units
Benchmark.SO.collectingAndThen thrpt 8 1412304,072 ± 53963,266 ops/s
Benchmark.SO.sortFindFirst thrpt 8 331214,270 ± 7966,082 ops/s
The collectingAndThen is 4 times faster.
If you scale down the persons list to 5 people, the figures change completly :
Benchmark Mode Cnt Score Error Units
Benchmark.SO.collectingAndThen thrpt 8 14529905,529 ± 423196,066 ops/s
Benchmark.SO.sortFindFirst thrpt 8 7645716,643 ± 538730,614 ops/s
But collectingAndThen is still 2x faster.
I suspect your test is off. There are a number of possible reasons, for example, classloading, JIT compilation and other warmups, ...
As @assylias pointed out in the comments, you should rely on better crafted micro benchmarks for measuring time of such "small" methods, to avoid the previously stated side effects. See : How do I write a correct micro-benchmark in Java?
No, collectingAndThen
is fine from an efficiency perspective.
Consider this code for generating a list of random integers:
List<Integer> list =
new Random().ints(5).boxed().collect(Collectors.toList());
You can get the max from this list using your two methods:
list.stream().collect(Collectors.collectingAndThen(
Collectors.maxBy(Comparator.naturalOrder()),
(Optional<Integer> n) -> n.orElse(0)));
and
list.stream().sorted().findFirst().get();
If you just time a single execution of these two methods, you might get a time something like this (ideone):
collectAndThen 2.884806
sortFindFirst 1.898522
Those are times in milliseconds.
But keep iterating, and you find the times change dramatically. After 100 iterations:
collectAndThen 0.00792
sortFindFirst 0.010873
Still times in milliseconds.
So, as previously stated, you're just not benchmarking the two approaches correctly.
来源:https://stackoverflow.com/questions/47810524/is-collectingandthen-method-enough-efficient