Is flatMap guaranteed to be lazy? [duplicate]

假如想象 提交于 2019-12-17 12:03:12

问题


Consider the following code:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

Will fetchDataFromInternet be called for second url when the first one was enough?

I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?

    Stream.of("one", "two", "three")
            .flatMap(num -> {
                System.out.println("Processing " + num);
                // return FetchFromInternetForNum(num).data().stream();
                return Stream.of(num);
            })
            .peek(num -> System.out.println("Peek before filter: "+ num))
            .filter(num -> num.length() > 0)
            .peek(num -> System.out.println("Peek after filter: "+ num))
            .forEach(num -> {
                System.out.println("Done " + num);
            });

Output:

Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three

Update: Using official Oracle JDK8 if that matters on implementation

Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.

If this behavior is intended, the API should let the function return an Iterable instead of a stream.

In other words: link


回答1:


Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :

 int result = Stream.of(1)
            .flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
            .findFirst()
            .get();

    System.out.println(result);

This never finishes as flatMap is computed eagerly. For your example:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.

EDIT

This is fixed in Java 10, where we get our laziness back: see JDK-8075939

EDIT 2

This is fixed in Java 8 too (8u222): JDK-8225328




回答2:


It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:

Stream.of("hello", "world")
      .flatMap(s -> {
          System.out.println("flatMap function evaluated for \""+s+'"');
          return s.chars().boxed();
      })
      .peek(c -> System.out.printf("processing element %c%n", c))
      .filter(c -> c>'h')
      .findFirst()
      .ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l

This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.

So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.




回答3:


Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.

 stream(spliterator).map(o -> o).flatMap(Stream::of)..flatMap(Stream::of).findAny()

For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.

StreamHelper is collection based, but it is easy to adapt Spliterator.

https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java



来源:https://stackoverflow.com/questions/46288915/is-flatmap-guaranteed-to-be-lazy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!