In which cases Stream operations should be stateful?

前端 未结 4 769
轮回少年
轮回少年 2021-01-01 03:30

In the javaodoc for the stream package, at the end of the section Parallelism, I read:

Most stream operations accept parameters that des

相关标签:
4条回答
  • 2021-01-01 04:15

    A stateless function returns the same output for the same inputs, "no matter what".

    It's easy to create non-stateless functions in an imperative language like Java. e.g.

        func = input -> currentTime();
    

    If we do stream.map(func) with a stateful func, the resulting stream will depend on how func is invoked at runtime; the behavior of the application will be hard to understand (but not that hard).

    If func is stateless, stream.map(func) will always produce the same stream, no matter how map is implemented and executed. This is nice and desirable.

    Note that "no matter what" implies that a stateless function must be thread-safe.


    If a function returns void, isn't it always stateless? Well... there's another connotation of stateless - invoking a stateless function should not have side effects that are "important" to the application.

    If func has no "important" side effects, it's safe to invoke func arbitarily. For example, stream.map(func) can safely invoke func multiple times even on the same element. (But don't worry, Stream is never gonna do that).

    What is an "important" side effect? That is very subjective.

    At the very least, invoking fun will cost some CPU time, which is not exactly free. This might be concerning for performance critical applications; or on expensive platforms (cough AWS).

    If func logs something on hardisk, it may or may not be an "important" side effect. (It too costs $$)

    If func queries an external service that costs dearly, it is very concerning, it can bankrupt you.

    Now, forget about money. Purely from application logic point of view, func could cause mutation to some state that the application depends on; even if func returns the same output for the same inputs, it still cannot be considered "stateless". For example, if in stream.map(func), func adds each element to a list, and later the application uses the list, the resulting list will depend on how func is invoked at runtime. This is frawned upon by functional-programmers.

    If we do stream.forEach( e->log(e) ), is it stateless? We can consider it stateless if

    • we don't care about the cost of log
    • log() can be invoked concurrently
    • we don't care about the order of log entries
    • log entries have no impact on this application's logic
    0 讨论(0)
  • 2021-01-01 04:18

    I have hard time understanding this "in most cases". In which cases is it acceptable/desirable to have a stateful stream operation?

    Suppose following scenario. You have a Stream<String> and you need to list the items in natural order prefexing each one with order number. So, for example on input you have: Banana, Apple and Grape. Output should be:

    1. Apple
    2. Banana
    3. Grape
    

    How you solve this task in Java Stream API? Pretty easily:

    List<String> f = asList("Banana", "Apple", "Grape");
    
    AtomicInteger number = new AtomicInteger(0);
    String result = f.stream()
      .sorted()
      .sequential()
      .map(i -> String.format("%d. %s", number.incrementAndGet(), i))
      .collect(Collectors.joining("\n"));
    

    Now if you look at this pipeline you'll see 3 stateful operations:

    • sorted() – stateful by definition. See documetation to Stream.sorted():

      This is a stateful intermediate operation

    • map() – by itself could be stateless or not, but in this case it is not. To label positions you need to keep track of how much items already labeled;
    • collect() – is mutable reduction operation (from docs to Stream.collect()). Mutable operations are stateful by definition, because they change (mutate) shared state.

    There are some controversy about why sorted() is stateful. From the Stream API documentation:

    Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.

    So when applying term stateful/stateless to a Stream API we're talking more about function processing element of a stream, and not about function processing stream as a whole.

    Also note that there is some confusion between terms stateless and deterministic. They are not the same.

    Deterministic function provide same result given same arguments.

    Stateless function retain no state from previous calls.

    Those are different definitions. And in general case doesn't depend on each other. Determinism is about function result value while statelessness about function implementation.

    0 讨论(0)
  • 2021-01-01 04:29

    Examples of stateful stream lambdas:

    • collect(Collector): The Collector is by definition stateful, since it has to collect all the elements in a collection (state).
    • forEach(Consumer): The Consumer is by definition stateful, well except if it's a black hole (no-op).
    • peek(Consumer): The Consumer is by definition stateful, because why peek if not to store it somewhere (e.g. log).

    So, Collector and Consumer are two lambda interfaces that by definition are stateful.

    All the others, e.g. Predicate, Function, UnaryOperator, BinaryOperator, and Comparator, should be stateless.

    0 讨论(0)
  • 2021-01-01 04:29

    When in doubt simply check the documentation to the specific operation. Examples:

    1. Stream.map mapper parameter:

      mapper - a non-interfering, stateless function to apply to each element

      Here documentation explicitly says that the function must be stateless.

    2. Stream.forEach action parameter:

      action - a non-interfering action to perform on the elements

      Here it's not specified that the action is stateless, thus it can be stateful.

    In general it's always explicitly written on every method documentation.

    0 讨论(0)
提交回复
热议问题