Spark RDD - Mapping with extra arguments

后端未结

关注

 1  531

星月不相逢 2021-02-01 02:49

Is it possible to pass extra arguments to the mapping function in pySpark? Specifically, I have the following code recipe:

raw_data_rdd = sc.textFile(\"data.jso


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   离开以前
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 03:24
              

            
            
                        

You can use an anonymous function either directly in a flatMap

json_data_rdd.flatMap(lambda j: processDataLine(j, arg1, arg2))


or to curry processDataLine

f = lambda j: processDataLine(dataline, arg1, arg2)
json_data_rdd.flatMap(f)

You can generate processDataLine like this:

def processDataLine(arg1, arg2):
    def _processDataLine(dataline):
        return ... # Do something with dataline, arg1, arg2
    return _processDataLine

json_data_rdd.flatMap(processDataLine(arg1, arg2))

toolz library provides useful curry decorator:

from toolz.functoolz import curry

@curry
def processDataLine(arg1, arg2, dataline): 
    return ... # Do something with dataline, arg1, arg2

json_data_rdd.flatMap(processDataLine(arg1, arg2))


Note that I've pushed dataline argument to the last position. It is not required but this way we don't have to use keyword args.
Finally there is functools.partial already mentioned by Avihoo Mamka in the comments.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复