Is collectingAndThen method enough efficient?

问题

I have recently started using collectingAndThen and found that it is taking a bit long time comparatively to the other coding procedures, which i used for performing the similar tasks.

Here is my code:

        System.out.println("CollectingAndThen");
        Long t = System.currentTimeMillis();
        String personWithMaxAge = persons.stream()
                                        .collect(Collectors.collectingAndThen(
                                                                Collectors.maxBy(Comparator.comparing(Person::getAge)),
                                                                (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
                                                ));


        System.out.println("personWithMaxAge - "+personWithMaxAge + " time taken = "+(System.currentTimeMillis() - t));
        Long t2 = System.currentTimeMillis();
        String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
                                                    .findFirst().get().name;
        System.out.println("personWithMaxAge2 : "+personWithMaxAge2+ " time taken = "+(System.currentTimeMillis() - t2));

And here it the output:

CollectingAndThen
personWithMaxAge - Peter time taken = 17
personWithMaxAge2 : Peter time taken = 1

which is showing that collectingAndThen is taking a greater time comparatively.

So my question is - Should I keep on collectingAndThen or there is some other suggestions?

回答1:

collectingAndThen adds a single action that is just performed at the end of the collection.

String personWithMaxAge = persons.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.comparing(Person::getAge)),
        (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
    ));

is not different to

Optional<Person> p = persons.stream()
    .collect(Collectors.maxBy(Comparator.comparing(Person::getAge)));
String personWithMaxAge = p.isPresent() ? p.get().getName() : "none";

The actual advantage of specifying the action within a collector shows when you use the resulting collector as input to another collect, e.g. groupingBy(f1, collectingAndThen(collector, f2)).

Since this is a single trivial action performed exactly once at the end, there is no impact on the performance. Further, it’s unlikely that a sorted based solution is faster than a maxBy operation for any nontrivial input.

You are simply using a broken benchmark violating several of the rules listed in “How do I write a correct micro-benchmark in Java?”. Most notably, you are measuring the initial initialization overhead of the first usage of the Stream framework. Just swapping the order of the two operations will give you an entirely different result.

Still, there is no need to make an operation unnecessarily complicated. As said, the advantage of collectors mirroring existing Stream operations is that they can be combined with other collectors. If no such combining is need, just use the straight-forward code

String personWithMaxAge = persons.stream()
    .max(Comparator.comparing(Person::getAge))
    .map(Person::getName).orElse("none");

which is simpler than the Collector usage and more efficient than the sorted based solution.

回答2:

TL;DR; The way you measure things is probably off.

I created a much more valid performance bench of your test, using JMH, set up as the following (the persons list should be initialized differently to further enhance the confidence in the results) :

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder().include(SO.class.getSimpleName()).jvmArgs("-Dfile.encoding=UTF-8").build();
    new Runner(opt).run();
}

public static final class Person {
    public Person(String name, int age) {
        this.name = name;
        this.age = age;
    }
    String name;
    int age;
    public int getAge() {return age;};
    public String getName() {return this.name;}
}

public static final List<Person> persons = IntStream.range(0, 100).mapToObj(i -> new Person("person" + i, RandomUtils.nextInt(0, 50))).collect(Collectors.toList());

@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float collectingAndThen() {
    String personWithMaxAge = persons.stream()
            .collect(Collectors.collectingAndThen(
                    Collectors.maxBy(Comparator.comparing(Person::getAge)),
                    (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
            ));
    return personWithMaxAge.length() * 1.0f;
}

@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float sortFindFirst() {
    String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
            .findFirst().get().name;
    return personWithMaxAge2.length() * 1.0f;
}

The results are beyond doubt on this test (with 100 persons in the list) :

# Run complete. Total time: 00:02:41

Benchmark                        Mode  Cnt        Score       Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  1412304,072 ± 53963,266  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   331214,270 ±  7966,082  ops/s

The collectingAndThen is 4 times faster.

If you scale down the persons list to 5 people, the figures change completly :

Benchmark                        Mode  Cnt         Score        Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  14529905,529 ± 423196,066  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   7645716,643 ± 538730,614  ops/s

But collectingAndThen is still 2x faster.

I suspect your test is off. There are a number of possible reasons, for example, classloading, JIT compilation and other warmups, ...

As @assylias pointed out in the comments, you should rely on better crafted micro benchmarks for measuring time of such "small" methods, to avoid the previously stated side effects. See : How do I write a correct micro-benchmark in Java?

回答3:

No, collectingAndThen is fine from an efficiency perspective.

Consider this code for generating a list of random integers:

List<Integer> list =
    new Random().ints(5).boxed().collect(Collectors.toList());

You can get the max from this list using your two methods:

    list.stream().collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.naturalOrder()),
        (Optional<Integer> n) -> n.orElse(0)));

and

    list.stream().sorted().findFirst().get();

If you just time a single execution of these two methods, you might get a time something like this (ideone):

collectAndThen 2.884806
sortFindFirst  1.898522

Those are times in milliseconds.

But keep iterating, and you find the times change dramatically. After 100 iterations:

collectAndThen 0.00792
sortFindFirst  0.010873

Still times in milliseconds.

So, as previously stated, you're just not benchmarking the two approaches correctly.

来源：https://stackoverflow.com/questions/47810524/is-collectingandthen-method-enough-efficient

标签

java

java-8

java-stream

collectors

collect