Java 8 Distinct by property

后端 未结 29 1865
傲寒
傲寒 2020-11-21 22:35

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?

For example I have a list of

29条回答
  •  名媛妹妹
    2020-11-21 22:54

    While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set". Things get even worse if list is above size of 10.

    Consider you have a collection of 20 Objects, like this:

    public static final List testList = Arrays.asList(
                new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
                new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
                new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
                new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
                new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
    

    Where you object SimpleEvent looks like this:

    public class SimpleEvent {
    
    private String name;
    private String type;
    
    public SimpleEvent(String name) {
        this.name = name;
        this.type = "type_"+name;
    }
    
    public String getName() {
        return name;
    }
    
    public void setName(String name) {
        this.name = name;
    }
    
    public String getType() {
        return type;
    }
    
    public void setType(String type) {
        this.type = type;
    }
    }
    

    And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :

    @Benchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
    
        Set uniqueNames = testList
                .stream()
                .filter(distinctByKey(SimpleEvent::getName))
                .map(SimpleEvent::getName)
                .collect(Collectors.toSet());
        blackhole.consume(uniqueNames);
    }
    
    @Benchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
        Set uniqueNames = new HashSet<>();
    
        for (SimpleEvent event : testList) {
            uniqueNames.add(event.getName());
        }
        blackhole.consume(uniqueNames);
    }
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MyBenchmark.class.getSimpleName())
                .forks(1)
                .mode(Mode.Throughput)
                .warmupBatchSize(3)
                .warmupIterations(3)
                .measurementIterations(3)
                .build();
    
        new Runner(opt).run();
    }
    

    Then you'll have Benchmark results like this:

    Benchmark                                  Mode  Samples        Score  Score error  Units
    c.s.MyBenchmark.aForEachBasedUniqueSet    thrpt        3  2635199.952  1663320.718  ops/s
    c.s.MyBenchmark.aStreamBasedUniqueSet     thrpt        3   729134.695   895825.697  ops/s
    

    And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.

    Higher the throughput, better the performance

提交回复
热议问题