submit .py script on Spark without Hadoop installation

前端 未结 1 1866
你的背包
你的背包 2021-01-03 05:52

I have the following simple wordcount Python script.

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster(\"local\").setAppName(\"My App\         


        
相关标签:
1条回答
  • 2021-01-03 06:07

    The good news is you're not doing anything wrong, and your code will run after the error is mitigated.

    Despite the statement that Spark will run on Windows without Hadoop, it still looks for some Hadoop components. The bug has a JIRA ticket (SPARK-2356), and a patch is available. As of Spark 1.3.1, the patch hasn't been committed to the main branch yet.

    Fortunately, there's a fairly easy work around.

    1. Create a bin directory for winutils under your Spark installation directory. In my case, Spark is installed in D:\Languages\Spark, so I created the following path: D:\Languages\Spark\winutils\bin

    2. Download the winutils.exe from Hortonworks and put it into the bin directory created in the first step. Download link for Win64: http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe

    3. Create a "HADOOP_HOME" environment variable that points to the winutils directory (not the bin subdirectory). You can do this in a couple of ways:

      • a. Establish a permanent environment variable via the Control Panel -> System -> Advanced System Settings -> Advanced Tab -> Environment variables. You can create either a user variable or a system variable with the following parameters:

        Variable Name=HADOOP_HOME Variable Value=D:\Languages\Spark\winutils\

      • b. Set a temporary environment variable inside your command shell before executing your script

        set HADOOP_HOME=d:\\Languages\\Spark\\winutils

    4. Run your code. It should work without error now.

    0 讨论(0)
提交回复
热议问题