While submit job with pyspark, how to access static files upload with --files argument?

后端未结

关注

 3  1012

佛祖请我去吃肉 2021-02-08 00:53

for example, i have a folder:

/
  - test.py
  - test.yml

and the job is submited to spark cluster with:

gcloud beta dataproc jobs


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   栀梦
                                             
                
                
                (楼主)
            
              
              
                2021-02-08 01:48
              

            
            
                        
Yep, Shagun is right.

Basically when you submit a spark job to spark, it does not serialize the file you want processed over to each worker. You will have to do it yourself.

Typically, you will have to put the file in a shared file system like HDFS, S3 (amazon), or any other DFS that can be accessed by all the workers. As soon as you do that, and specify the file destination in your spark script, the spark job will be able to read and process as you wish.

However, having said this, copying the file into the same destination in ALL of you workers and master's file structure also work. Exp, you can create folders like /opt/spark-job/all-files/ in ALL spark nodes, rsync the file to all of them, and then you can use file in your spark script. But please do not do this. DFS or S3 are way better than this approach.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复