Do sorted and distinct immediately process the stream?

后端未结

关注

 2  1820

孤城傲影 2021-02-07 12:24

Imagine I have something that looks like this:

Stream stream = Stream.of(2,1,3,5,6,7,9,11,10)
            .distinct()
            .sorted();


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   别那么骄傲
                                             
                
                
                (楼主)
            
              
              
                2021-02-07 12:43
              

            
            
                        
You have asked a loaded question, implying that there had to be a choice between two alternatives.

The stateful intermediate operations have to store data, in some cases up to the point of storing all elements before being able to pass an element downstream, but that doesn’t change the fact that this work is deferred until a terminal operation has been commenced.

It’s also not correct to say that it has to “traverse the stream twice”. There are entirely different traversals going on, e.g. in the case of sorted(), first, the traversal of the source filling on internal buffer that will be sorted, second, the traversal of the buffer. In case of distinct(), no second traversal happens in the sequential processing, the internal HashSet is just used to determine whether to pass an element downstream.

So when you run

Stream stream = Stream.of(2,1,3,5,3)
    .peek(i -> System.out.println("source: "+i))
    .distinct()
    .peek(i -> System.out.println("distinct: "+i))
    .sorted()
    .peek(i -> System.out.println("sorted: "+i));
System.out.println("commencing terminal operation");
stream.forEachOrdered(i -> System.out.println("terminal: "+i));


it prints

commencing terminal operation
source: 2
distinct: 2
source: 1
distinct: 1
source: 3
distinct: 3
source: 5
distinct: 5
source: 3
sorted: 1
terminal: 1
sorted: 2
terminal: 2
sorted: 3
terminal: 3
sorted: 5
terminal: 5


showing that nothing happens before the terminal operation has been commenced and that elements from the source immediately pass the distinct() operation (unless being duplicates), whereas all elements are buffered in the sorted() operation before being passed downstream.

It can further be shown that distinct() does not need to traverse the entire stream:

Stream.of(2,1,1,3,5,6,7,9,2,1,3,5,11,10)
    .peek(i -> System.out.println("source: "+i))
    .distinct()
    .peek(i -> System.out.println("distinct: "+i))
    .filter(i -> i>2)
    .findFirst().ifPresent(i -> System.out.println("found: "+i));


prints

source: 2
distinct: 2
source: 1
distinct: 1
source: 1
source: 3
distinct: 3
found: 3


As explained and demonstrated by Jose Da Silva’s answer, the amount of buffering may change with ordered parallel streams, as partial results must be adjusted before they can get passed to downstream operations.

Since these operations do not happen before the actual terminal operation is known, there are more optimizations possible than currently happen in OpenJDK (but may happen in different implementations or future versions). E.g. sorted().toArray() may use and return the same array or sorted().findFirst() may turn into a min(), etc.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复