Is it important to use Characteristics.UNORDERED in Collectors when possible?

后端 未结 1 893
说谎
说谎 2021-02-09 01:10

Since I use streams a great deal, some of them dealing with a large amount of data, I thought it would be a good idea to pre-allocate my collection-based collectors with an appr

相关标签:
1条回答
  • 2021-02-09 01:46

    First of all, the UNORDERED characteristic of a Collector is there to aid performance and nothing else. There is nothing wrong with a Collector not having that characteristic but not depending on the encounter order.

    Whether this characteristic has an impact depends on the stream operations itself and implementation details. While the current implementation may not drain much advantage from it, due to the difficulties with the back-propagation, it doesn’t imply that future versions won’t. Of course, a stream which is already unordered, is not affected by the UNORDERED characteristic of the Collector. And not all stream operations have potential to benefit from it.

    So the more important question is how important is it not to prevent such potential optimizations (perhaps in the future).

    Note that there are other unspecified implementation details, affecting the potential optimizations when it comes to your second variant. The toCollection(Supplier) collector has unspecified inner workings and only guarantees to provide a final result of the type produced by the Supplier. In contrast, Collector.of(() -> new HashSet<>(initialCapacity), Set::add, (c1, c2) -> { c1.addAll(c2); return c1; }, IDENTITY_FINISH, UNORDERED) defines precisely how the collector ought to work and may also hinder internal optimizations of collection producing collectors of future versions.

    So a way to specify the characteristics without touching the other aspects of a Collector would be the best solution, but as far as I know, there is no simple way offered by the existing API. But it’s easy to build such a facility yourself:

    public static <T,A,R> Collector<T,A,R> characteristics(
                          Collector<T,A,R> c, Collector.Characteristics... ch) {
        Set<Collector.Characteristics> o = c.characteristics();
        if(!o.isEmpty()) {
            o=EnumSet.copyOf(o);
            Collections.addAll(o, ch);
            ch=o.toArray(ch);
        }
        return Collector.of(c.supplier(), c.accumulator(), c.combiner(), c.finisher(), ch);
    }
    

    with that method, it’s easy to say, e.g.

    HashSet<String> set=stream
        .collect(characteristics(toCollection(()->new HashSet<>(capacity)), UNORDERED));
    

    or provide your factory method

    public static <T> Collector<T, ?, Set<T>> toSetSized(int initialCapacity) {
        return characteristics(toCollection(()-> new HashSet<>(initialCapacity)), UNORDERED);
    }
    

    This limits the effort necessary to provide your characteristics (if it is a recurring problem), so it won’t hurt to provide them, even if you don’t know how much impact it will have.

    0 讨论(0)
提交回复
热议问题