Spark sort by key and then group by to get ordered iterable?

前端未结
关注
 2  470
悲&欢浪女 2021-01-03 08:10
I have a Pair RDD (K, V) with the key containing a time and an ID. I would like to get a Pair RDD of the form (K, Iterable

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   伪装坚强ぢ
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 08:43
              

            
            
                        
The answer from Matei, who I consider authoritative on this topic,  is quite clear:


  The order is not guaranteed actually, only which keys end up in each
  partition. Reducers may fetch data from map tasks in an arbitrary
  order, depending on which ones are available first. If you’d like a
  specific order, you should sort each partition. Here you might be
  getting it because each partition only ends up having one element, and
  collect() does return the partitions in order.


In that context, a better option would be to apply the sorting to the resulting collections per key:

rdd.groupByKey().mapValues(_.sorted)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复