Spark sort by key and then group by to get ordered iterable?

前端未结
关注
 2  468
悲&欢浪女 2021-01-03 08:10
I have a Pair RDD (K, V) with the key containing a time and an ID. I would like to get a Pair RDD of the form (K, Iterable

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   一整个雨季
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 09:03
              

            
            
                        
The Spark Programming Guide offers three alternatives if one desires predictably ordered data following shuffle:


  
  mapPartitions to sort each partition using, for example, .sorted
  repartitionAndSortWithinPartitions to efficiently sort partitions while simultaneously repartitioning
  sortBy to make a globally ordered RDD
  


As written in the Spark API, repartitionAndSortWithinPartitions is more efficient than calling repartition and then sorting within each partition because it can push the sorting down into the shuffle machinery.

The sorting, however, is computed by looking only at the keys K of tuples (K, V). The trick is to put all the relevant informations in the first element of the tuple, like ((K, V), null), defining a custom partitioner and a custom ordering. This article descrives pretty well the technique.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复