submit .py script on Spark without Hadoop installation

前端未结

关注

 1  1866

I have the following simple wordcount Python script.

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster(\"local\").setAppName(\"My App\


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-03 06:07
              
            
            
                                                                       
The good news is you're not doing anything wrong, and your code will run after the error is mitigated.

Despite the statement that Spark will run on Windows without Hadoop, it still looks for some Hadoop components.  The bug has a JIRA ticket (SPARK-2356), and a patch is available.  As of Spark 1.3.1, the patch hasn't been committed to the main branch yet.

Fortunately, there's a fairly easy work around.


Create a bin directory for winutils under your Spark installation directory.  In my case, Spark is installed in D:\Languages\Spark, so I created the following path:  D:\Languages\Spark\winutils\bin
Download the winutils.exe from Hortonworks and put it into the bin directory created in the first step.  Download link for Win64:  http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
Create a "HADOOP_HOME" environment variable that points to the winutils directory (not the bin subdirectory).  You can do this in a couple of ways:  


a. Establish a permanent environment variable via the Control Panel -> System -> Advanced System Settings -> Advanced Tab -> Environment variables.  You can create either a user variable or a system variable with the following parameters:

Variable Name=HADOOP_HOME
 Variable Value=D:\Languages\Spark\winutils\

b. Set a temporary environment variable inside your command shell
before executing your script

set HADOOP_HOME=d:\\Languages\\Spark\\winutils

Run your code.  It should work without error now.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复