Java 8 Distinct by property

后端 未结 29 1832
傲寒
傲寒 2020-11-21 22:35

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?

For example I have a list of

相关标签:
29条回答
  • 2020-11-21 22:54

    While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set". Things get even worse if list is above size of 10.

    Consider you have a collection of 20 Objects, like this:

    public static final List<SimpleEvent> testList = Arrays.asList(
                new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
                new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
                new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
                new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
                new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
    

    Where you object SimpleEvent looks like this:

    public class SimpleEvent {
    
    private String name;
    private String type;
    
    public SimpleEvent(String name) {
        this.name = name;
        this.type = "type_"+name;
    }
    
    public String getName() {
        return name;
    }
    
    public void setName(String name) {
        this.name = name;
    }
    
    public String getType() {
        return type;
    }
    
    public void setType(String type) {
        this.type = type;
    }
    }
    

    And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :

    @Benchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
    
        Set<String> uniqueNames = testList
                .stream()
                .filter(distinctByKey(SimpleEvent::getName))
                .map(SimpleEvent::getName)
                .collect(Collectors.toSet());
        blackhole.consume(uniqueNames);
    }
    
    @Benchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
        Set<String> uniqueNames = new HashSet<>();
    
        for (SimpleEvent event : testList) {
            uniqueNames.add(event.getName());
        }
        blackhole.consume(uniqueNames);
    }
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MyBenchmark.class.getSimpleName())
                .forks(1)
                .mode(Mode.Throughput)
                .warmupBatchSize(3)
                .warmupIterations(3)
                .measurementIterations(3)
                .build();
    
        new Runner(opt).run();
    }
    

    Then you'll have Benchmark results like this:

    Benchmark                                  Mode  Samples        Score  Score error  Units
    c.s.MyBenchmark.aForEachBasedUniqueSet    thrpt        3  2635199.952  1663320.718  ops/s
    c.s.MyBenchmark.aStreamBasedUniqueSet     thrpt        3   729134.695   895825.697  ops/s
    

    And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.

    Higher the throughput, better the performance

    0 讨论(0)
  • 2020-11-21 22:55

    An alternative would be to place the persons in a map using the name as a key:

    persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
    

    Note that the Person that is kept, in case of a duplicate name, will be the first encontered.

    0 讨论(0)
  • 2020-11-21 22:55

    In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.

    public List<Log> fetchLogById(Long id) {
        return this.findLogById(id).stream()
            .filter(new LogPredicate())
            .collect(Collectors.toList());
    }
    
    public class LogPredicate implements Predicate<Log> {
    
        private Log previous;
    
        public boolean test(Log atual) {
            boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
    
            if (isDifferent) {
                previous = current;
            }
            return isDifferent;
        }
    
        private boolean verifyIfDifferentLog(Log current, Log previous) {
            return !current.getId().equals(previous.getId());
        }
    
    }
    
    0 讨论(0)
  • 2020-11-21 22:56

    Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:

    Seq.seq(persons).distinct(Person::getName).toList();
    

    Under the hood, it does practically the same thing as the accepted answer, though.

    0 讨论(0)
  • 2020-11-21 22:57

    My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.

      List<YourPersonClass> listWithDistinctPersons =   persons.stream()
                //operators to remove duplicates based on person name
                .collect(Collectors.groupingBy(p -> p.getName()))
                .values()
                .stream()
                //cut short the groups to size of 1
                .flatMap(group -> group.stream().limit(1))
                //collect distinct users as list
                .collect(Collectors.toList());
    
    0 讨论(0)
  • 2020-11-21 22:57

    Late to the party but I sometimes use this one-liner as an equivalent:

    ((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
    

    The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.

    0 讨论(0)
提交回复
热议问题