According to the OCP book one must avoid stateful operations otherwise known as stateful lambda expression. The definition provided in the book is \'a stateful lambda expres
A stateful lambda expression is one whose result depends on any state that might change during the execution of a stream pipeline.
Let's understand this with an example here:
List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
List<Integer> result = new ArrayList<Integer>();
list.parallelStream().map(s -> {
synchronized (result) {
if (result.size() < 10) {
result.add(s);
}
}
return s;
}).forEach( e -> {});
System.out.println(result);
When you run this code 5 times, the output would/could be different all the time. Reason behind is here processing of Lambda expression inside map updates result array. Since here the result array depend on the size of that array for a particular sub stream, which would change every time this parallel stream would be called.
For better understanding of parallel stream: Parallel computing involves dividing a problem into subproblems, solving those problems simultaneously (in parallel, with each subproblem running in a separate thread), and then combining the results of the solutions to the subproblems. When a stream executes in parallel, the Java runtime partitions the streams into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.
Hope this helps!!!
Here is an example where a stateful operation returns a different result each time:
public static void main(String[] args) {
Set<Integer> seen = new HashSet<>();
IntStream stream = IntStream.of(1, 2, 3, 1, 2, 3);
// Stateful lambda expression
IntUnaryOperator mapUniqueLambda = (int i) -> {
if (!seen.contains(i)) {
seen.add(i);
return i;
}
else {
return 0;
}
};
int sum = stream.parallel().map(mapUniqueLambda).peek(i -> System.out.println("Stream member: " + i)).sum();
System.out.println("Sum: " + sum);
}
In my case when I ran the code I got the following output:
Stream member: 1
Stream member: 0
Stream member: 2
Stream member: 3
Stream member: 1
Stream member: 2
Sum: 9
Why did I get 9 as the sum if I'm inserting into a hashset?
The answer: Different threads took different parts of the IntStream
For example values 1 & 2 managed to end up on different threads.
A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline. On the other hand, a stateless lambda expression is one whose result does not depend on any state that might change during the execution of a pipeline.
Source: OCP: Oracle Certified Professional Java SE 8 Programmer II Study Guide: Exam 1Z0-809by Jeanne Boyarsky, Scott Selikoff
List < Integer > data = Collections.synchronizedList(new ArrayList < > ());
Arrays.asList(1, 2, 3, 4, 5, 6, 7).parallelStream()
.map(i -> {
data.add(i);
return i;
}) // AVOID STATEFUL LAMBDA EXPRESSIONS!
.forEachOrdered(i -> System.out.print(i+" "));
System.out.println();
for (int e: data) {
System.out.print(e + " ");
Possible Output:
1 2 3 4 5 6 7
1 7 5 2 3 4 6
It strongly recommended that you avoid stateful operations when using parallel streams, so as to remove any potential data side effects. In fact, they should generally be avoided in serial streams wherever possible, since they prevent your streams from taking advantage of parallelization.
The first problem is this:
List<Integer> list = new ArrayList<>();
List<Integer> result = Stream.of(1, 2, 3, 4, 5, 6)
.parallel()
.map(x -> {
list.add(x);
return x;
})
.collect(Collectors.toList());
System.out.println(list);
You have no idea what the result will be here, since you are adding elements to a non-thread-safe collection ArrayList
.
But even if you do:
List<Integer> list = Collections.synchronizedList(new ArrayList<>());
And perform the same operation the list
has no predictable order. Multiple Threads add to this synchronized collection. By adding the synchronized collection you guarantee that all elements are added (as opposed to the plain ArrayList
), but in which order they will be present in unknown.
Notice that list
has no order guarantees what-so-ever, this is called processing order. While result
is guaranteed to be: [1, 2, 3, 4, 5, 6]
for this particular example.
Depending on the problem, you usually can get rid of the stateful
operations; for your example returning the synchronized List
would be:
Stream.of(1, 2, 3, 4, 5, 6)
.filter(x -> x > 2) // for example a filter is present
.collect(Collectors.collectingAndThen(Collectors.toList(),
Collections::synchronizedList));
To try to give an example, let's consider the following Consumer
(note : the usefulness of such a function is not of the matter here) :
public static class StatefulConsumer implements IntConsumer {
private static final Integer ARBITRARY_THRESHOLD = 10;
private boolean flag = false;
private final List<Integer> list = new ArrayList<>();
@Override
public void accept(int value) {
if(flag){ // exit condition
return;
}
if(value >= ARBITRARY_THRESHOLD){
flag = true;
}
list.add(value);
}
}
It's a consumer that will add items to a List
(let's not consider how to get back the list nor the thread safety) and has a flag (to represent the statefulness).
The logic behind this would be that once the threshold has been reached, the consumer should stop adding items.
What your book was trying to say was that because there is no guaranteed order in which the function will have to consume the elements of the Stream
, the output is non-deterministic.
Thus, they advise you to only use stateless functions, meaning they will always produce the same result with the same input.