Since I use streams a great deal, some of them dealing with a large amount of data, I thought it would be a good idea to pre-allocate my collection-based collectors with an appr
First of all, the UNORDERED
characteristic of a Collector
is there to aid performance and nothing else. There is nothing wrong with a Collector
not having that characteristic but not depending on the encounter order.
Whether this characteristic has an impact depends on the stream operations itself and implementation details. While the current implementation may not drain much advantage from it, due to the difficulties with the back-propagation, it doesn’t imply that future versions won’t. Of course, a stream which is already unordered, is not affected by the UNORDERED
characteristic of the Collector
. And not all stream operations have potential to benefit from it.
So the more important question is how important is it not to prevent such potential optimizations (perhaps in the future).
Note that there are other unspecified implementation details, affecting the potential optimizations when it comes to your second variant. The toCollection(Supplier)
collector has unspecified inner workings and only guarantees to provide a final result of the type produced by the Supplier
. In contrast, Collector.of(() -> new HashSet<>(initialCapacity), Set::add, (c1, c2) -> { c1.addAll(c2); return c1; }, IDENTITY_FINISH, UNORDERED)
defines precisely how the collector ought to work and may also hinder internal optimizations of collection producing collectors of future versions.
So a way to specify the characteristics without touching the other aspects of a Collector
would be the best solution, but as far as I know, there is no simple way offered by the existing API. But it’s easy to build such a facility yourself:
public static Collector characteristics(
Collector c, Collector.Characteristics... ch) {
Set o = c.characteristics();
if(!o.isEmpty()) {
o=EnumSet.copyOf(o);
Collections.addAll(o, ch);
ch=o.toArray(ch);
}
return Collector.of(c.supplier(), c.accumulator(), c.combiner(), c.finisher(), ch);
}
with that method, it’s easy to say, e.g.
HashSet set=stream
.collect(characteristics(toCollection(()->new HashSet<>(capacity)), UNORDERED));
or provide your factory method
public static Collector> toSetSized(int initialCapacity) {
return characteristics(toCollection(()-> new HashSet<>(initialCapacity)), UNORDERED);
}
This limits the effort necessary to provide your characteristics (if it is a recurring problem), so it won’t hurt to provide them, even if you don’t know how much impact it will have.