In Java 8 how can I filter a collection using the Stream
API by checking the distinctness of a property of each object?
For example I have a list of
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set". Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent
looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List
.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value>
but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.