Can I duplicate a Stream in Java 8?

后端 未结 8 2138
别跟我提以往
别跟我提以往 2020-12-08 04:03

Sometimes I want to perform a set of operations on a stream, and then process the resulting stream two different ways with other operations.

Can I do this without ha

相关标签:
8条回答
  • 2020-12-08 04:12

    It's possible if you're buffering elements that you've consumed in one duplicate, but not in the other yet.

    We've implemented a duplicate() method for streams in jOOλ, an Open Source library that we created to improve integration testing for jOOQ. Essentially, you can just write:

    Tuple2<Seq<Integer>, Seq<Integer>> desired_streams = Seq.seq(
        IntStream.range(1, 100).filter(n -> n % 2 == 0).boxed()
    ).duplicate();
    

    (note: we currently need to box the stream, as we haven't implemented an IntSeq yet)

    Internally, there is a LinkedList buffer storing all values that have been consumed from one stream but not from the other. That's probably as efficient as it gets if your two streams are consumed about at the same rate.

    Here's how the algorithm works:

    static <T> Tuple2<Seq<T>, Seq<T>> duplicate(Stream<T> stream) {
        final LinkedList<T> gap = new LinkedList<>();
        final Iterator<T> it = stream.iterator();
    
        @SuppressWarnings("unchecked")
        final Iterator<T>[] ahead = new Iterator[] { null };
    
        class Duplicate implements Iterator<T> {
            @Override
            public boolean hasNext() {
                if (ahead[0] == null || ahead[0] == this)
                    return it.hasNext();
    
                return !gap.isEmpty();
            }
    
            @Override
            public T next() {
                if (ahead[0] == null)
                    ahead[0] = this;
    
                if (ahead[0] == this) {
                    T value = it.next();
                    gap.offer(value);
                    return value;
                }
    
                return gap.poll();
            }
        }
    
        return tuple(seq(new Duplicate()), seq(new Duplicate()));
    }
    

    More source code here

    In fact, using jOOλ, you'll be able to write a complete one-liner like so:

    Tuple2<Seq<Integer>, Seq<Integer>> desired_streams = Seq.seq(
        IntStream.range(1, 100).filter(n -> n % 2 == 0).boxed()
    ).duplicate()
     .map1(s -> s.filter(n -> n % 7 == 0))
     .map2(s -> s.filter(n -> n % 5 == 0));
    
    // This will yield 14, 28, 42, 56...
    desired_streams.v1.forEach(System.out::println)
    
    // This will yield 10, 20, 30, 40...
    desired_streams.v2.forEach(System.out::println);
    
    0 讨论(0)
  • 2020-12-08 04:16

    It is not possible to duplicate a stream in this way. However, you can avoid the code duplication by moving the common part into a method or lambda expression.

    Supplier<IntStream> supplier = () ->
        IntStream.range(1, 100).filter(n -> n % 2 == 0);
    supplier.get().filter(...);
    supplier.get().filter(...);
    
    0 讨论(0)
  • 2020-12-08 04:19

    I used this great answer to write following class:

    public class SplitStream<T> implements Stream<T> {
        private final Supplier<Stream<T>> streamSupplier;
    
        public SplitStream(Supplier<Stream<T>> t) {
            this.streamSupplier = t;
        }
    
        @Override
        public Stream<T> filter(Predicate<? super T> predicate) {
            return streamSupplier.get().filter(predicate);
        }
    
        @Override
        public <R> Stream<R> map(Function<? super T, ? extends R> mapper) {
            return streamSupplier.get().map(mapper);
        }
    
        @Override
        public IntStream mapToInt(ToIntFunction<? super T> mapper) {
            return streamSupplier.get().mapToInt(mapper);
        }
    
        @Override
        public LongStream mapToLong(ToLongFunction<? super T> mapper) {
            return streamSupplier.get().mapToLong(mapper);
        }
    
        @Override
        public DoubleStream mapToDouble(ToDoubleFunction<? super T> mapper) {
            return streamSupplier.get().mapToDouble(mapper);
        }
    
        @Override
        public <R> Stream<R> flatMap(Function<? super T, ? extends Stream<? extends R>> mapper) {
            return streamSupplier.get().flatMap(mapper);
        }
    
        @Override
        public IntStream flatMapToInt(Function<? super T, ? extends IntStream> mapper) {
            return streamSupplier.get().flatMapToInt(mapper);
        }
    
        @Override
        public LongStream flatMapToLong(Function<? super T, ? extends LongStream> mapper) {
            return streamSupplier.get().flatMapToLong(mapper);
        }
    
        @Override
        public DoubleStream flatMapToDouble(Function<? super T, ? extends DoubleStream> mapper) {
            return streamSupplier.get().flatMapToDouble(mapper);
        }
    
        @Override
        public Stream<T> distinct() {
            return streamSupplier.get().distinct();
        }
    
        @Override
        public Stream<T> sorted() {
            return streamSupplier.get().sorted();
        }
    
        @Override
        public Stream<T> sorted(Comparator<? super T> comparator) {
            return streamSupplier.get().sorted(comparator);
        }
    
        @Override
        public Stream<T> peek(Consumer<? super T> action) {
            return streamSupplier.get().peek(action);
        }
    
        @Override
        public Stream<T> limit(long maxSize) {
            return streamSupplier.get().limit(maxSize);
        }
    
        @Override
        public Stream<T> skip(long n) {
            return streamSupplier.get().skip(n);
        }
    
        @Override
        public void forEach(Consumer<? super T> action) {
            streamSupplier.get().forEach(action);
        }
    
        @Override
        public void forEachOrdered(Consumer<? super T> action) {
            streamSupplier.get().forEachOrdered(action);
        }
    
        @Override
        public Object[] toArray() {
            return streamSupplier.get().toArray();
        }
    
        @Override
        public <A> A[] toArray(IntFunction<A[]> generator) {
            return streamSupplier.get().toArray(generator);
        }
    
        @Override
        public T reduce(T identity, BinaryOperator<T> accumulator) {
            return streamSupplier.get().reduce(identity, accumulator);
        }
    
        @Override
        public Optional<T> reduce(BinaryOperator<T> accumulator) {
            return streamSupplier.get().reduce(accumulator);
        }
    
        @Override
        public <U> U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner) {
            return streamSupplier.get().reduce(identity, accumulator, combiner);
        }
    
        @Override
        public <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner) {
            return streamSupplier.get().collect(supplier, accumulator, combiner);
        }
    
        @Override
        public <R, A> R collect(Collector<? super T, A, R> collector) {
            return streamSupplier.get().collect(collector);
        }
    
        @Override
        public Optional<T> min(Comparator<? super T> comparator) {
            return streamSupplier.get().min(comparator);
        }
    
        @Override
        public Optional<T> max(Comparator<? super T> comparator) {
            return streamSupplier.get().max(comparator);
        }
    
        @Override
        public long count() {
            return streamSupplier.get().count();
        }
    
        @Override
        public boolean anyMatch(Predicate<? super T> predicate) {
            return streamSupplier.get().anyMatch(predicate);
        }
    
        @Override
        public boolean allMatch(Predicate<? super T> predicate) {
            return streamSupplier.get().allMatch(predicate);
        }
    
        @Override
        public boolean noneMatch(Predicate<? super T> predicate) {
            return streamSupplier.get().noneMatch(predicate);
        }
    
        @Override
        public Optional<T> findFirst() {
            return streamSupplier.get().findFirst();
        }
    
        @Override
        public Optional<T> findAny() {
            return streamSupplier.get().findAny();
        }
    
        @Override
        public Iterator<T> iterator() {
            return streamSupplier.get().iterator();
        }
    
        @Override
        public Spliterator<T> spliterator() {
            return streamSupplier.get().spliterator();
        }
    
        @Override
        public boolean isParallel() {
            return streamSupplier.get().isParallel();
        }
    
        @Override
        public Stream<T> sequential() {
            return streamSupplier.get().sequential();
        }
    
        @Override
        public Stream<T> parallel() {
            return streamSupplier.get().parallel();
        }
    
        @Override
        public Stream<T> unordered() {
            return streamSupplier.get().unordered();
        }
    
        @Override
        public Stream<T> onClose(Runnable closeHandler) {
            return streamSupplier.get().onClose(closeHandler);
        }
    
        @Override
        public void close() {
            streamSupplier.get().close();
        }
    }
    

    When you call any method of it's class, it delegates call to

    streamSupplier.get()
    

    So, instead of:

    Supplier<IntStream> supplier = () ->
        IntStream.range(1, 100).filter(n -> n % 2 == 0);
    supplier.get().filter(...);
    supplier.get().filter(...);
    

    You can do:

    SplitStream<Integer> stream = 
        new SplitStream<>(() -> IntStream.range(1, 100).filter(n -> n % 2 == 0).boxed());
    stream.filter(...);
    stream.filter(...);
    

    You can expand it to work with IntStream, DoubleStream, etc...

    0 讨论(0)
  • 2020-12-08 04:21

    Update: This doesn't work. See explanation below, after the text of the original answer.

    How silly of me. All that I need to do is:

    Stream desired_stream = IntStream.range(1, 100).filter(n -> n % 2 == 0);
    Stream stream14 = desired_stream.filter(n -> n % 7 == 0); // multiples of 14
    Stream stream10 = desired_stream.filter(n -> n % 5 == 0); // multiples of 10
    

    Explanation why this does not work:

    If you code it up and try to collect both streams, the first one will collect fine, but trying to stream the second one will throw the exception: java.lang.IllegalStateException: stream has already been operated upon or closed.

    To elaborate, streams are stateful objects (which by the way cannot be reset or rewound). You can think of them as iterators, which in turn are like pointers. So stream14 and stream10 can be thought of as references to the same pointer. Consuming the first stream all the way will cause the pointer to go "past the end." Trying to consume the second stream is like trying to access a pointer that is already "past the end," Which naturally is an illegal operation.

    As the accepted answer shows, the code to create the stream must be executed twice but it can be compartmentalized into a Supplier lambda or a similar construct.

    Full test code: save into Foo.java, then javac Foo.java, then java Foo

    import java.util.stream.IntStream;
    
    public class Foo {
      public static void main (String [] args) {
        IntStream s = IntStream.range(0, 100).filter(n -> n % 2 == 0);
        IntStream s1 = s.filter(n -> n % 5 == 0);
        s1.forEach(n -> System.out.println(n));
        IntStream s2 = s.filter(n -> n % 7 == 0);
        s2.forEach(n -> System.out.println(n));
      }
    }
    

    Output:

    $ javac Foo.java
    $ java Foo
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed
        at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
        at java.util.stream.IntPipeline.<init>(IntPipeline.java:91)
        at java.util.stream.IntPipeline$StatelessOp.<init>(IntPipeline.java:592)
        at java.util.stream.IntPipeline$9.<init>(IntPipeline.java:332)
        at java.util.stream.IntPipeline.filter(IntPipeline.java:331)
        at Foo.main(Foo.java:8)
    
    0 讨论(0)
  • 2020-12-08 04:22

    It is not possible in general.

    If you want to duplicate an input stream, or input iterator, you have two options:

    A. Keep everything in a collection, say a List<>

    Suppose you duplicate a stream into two streams s1 and s2. If you have advanced n1 elements in s1 and n2 elements with s2, you must keep |n2 - n1| elements in memory, just to keep pace. If your stream is infinite, there may be no upper bound for the storage required.

    Take a look at Python's tee() to see what it takes:

    This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

    B. When possible: Copy the state of the generator that creates the elements

    For this option to work, you'll probably need access to the inner workings of the stream. In other words, the generator - the part that creates the elements - should support copying in the first place. [OP: See this great answer, as an example of how this can be done for the example in the question]

    It will not work on input from the user, since you'll have to copy the state of the entire "outside world". Java's Stream do not support copying, since it is designed to be as general as possible; for example, to work with files, network, keyboard, sensors, randomness etc. [OP: Another example is a stream that reads a temperature sensor on demand. It cannot be duplicated without storing a copy of the readings]

    This is not only the case in Java; this is a general rule. You can see that std::istream in C++ only supports move semantics, not copy semantics ("copy constructor (deleted)"), for this reason (and others).

    0 讨论(0)
  • 2020-12-08 04:31

    You can also move the stream generation into separate method/function that returns this stream and call it twice.

    0 讨论(0)
提交回复
热议问题