I have the following simple wordcount Python script.
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster(\"local\").setAppName(\"My App\
The good news is you're not doing anything wrong, and your code will run after the error is mitigated.
Despite the statement that Spark will run on Windows without Hadoop, it still looks for some Hadoop components. The bug has a JIRA ticket (SPARK-2356), and a patch is available. As of Spark 1.3.1, the patch hasn't been committed to the main branch yet.
Fortunately, there's a fairly easy work around.
Create a bin directory for winutils under your Spark installation directory. In my case, Spark is installed in D:\Languages\Spark, so I created the following path: D:\Languages\Spark\winutils\bin
Download the winutils.exe from Hortonworks and put it into the bin directory created in the first step. Download link for Win64: http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
Create a "HADOOP_HOME" environment variable that points to the winutils directory (not the bin subdirectory). You can do this in a couple of ways:
a. Establish a permanent environment variable via the Control Panel -> System -> Advanced System Settings -> Advanced Tab -> Environment variables
. You can create either a user variable or a system variable with the following parameters:
Variable Name=HADOOP_HOME
Variable Value=D:\Languages\Spark\winutils\
b. Set a temporary environment variable inside your command shell before executing your script
set HADOOP_HOME=d:\\Languages\\Spark\\winutils
Run your code. It should work without error now.