I am having trouble in starting spark-shell on my Windows computer now. The version of Spark I am using is 1.5.2 pre-built for Hadoop 2.4 or later. I think spark-shell.cmd could
There are a couple of issues. You're on Windows and things are different on this OS comparing to other POSIX-compliant OSes.
Start by reading Problems running Hadoop on Windows document and see if "missing WINUTILS.EXE" is the issue. Make sure you run spark-shell
in console with admin rights.
You may also want to read the answers to a similar question Why does starting spark-shell fail with NullPointerException on Windows?
Also, you may have started spark-shell
inside bin
subdirectory and hence the errors like:
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar."
And the last issue:
15/11/18 17:51:47 WARN : Your hostname, Lenovo-PC resolves to a loopback/non-reachable address: fe80:0:0:0:297a:e76d:828:59dc%wlan2, but we couldn't find any external IP address!
One workaround is to set SPARK_LOCAL_HOSTNAME
to some resolvable host name and be done with it.
SPARK_LOCAL_HOSTNAME
is the custom host name that overrides any other candidates for hostname when driver, master, workers, and executors are created.In your case, using spark-shell
, just execute the following:
SPARK_LOCAL_HOSTNAME=localhost ./bin/spark-shell
You can also use:
./bin/spark-shell -c spark.driver.host=localhost
Refer also to Environment Variables in the official documentation of Spark.