I\'ve downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:
> E
linux
ENV SPARK_DIST_CLASSPATH="$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"
windows
set SPARK_DIST_CLASSPATH=%HADOOP_HOME%\etc\hadoop\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\tools\lib\*
I had the same issue ....Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/ FSDataInputStream at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa rkProperties$1.apply(SparkSubmitArguments.scala:111)... Then I realized that I had installed the spark version without hadoop. I installed the "with-hadoop" version the problem went away.
Run below from your package dir just before running spark-submit -
export SPARK_DIST_CLASSPATH=`hadoop classpath`
I got this error because the file was copied from Windows. Resolve it using
dos2unix file_name
I ran into the same error when trying to get familiar with spark. My understanding of the error message is that while spark doesn't need a hadoop cluster to run, it does need some of the hadoop classes. Since I was just playing around with spark and didn't care what version of hadoop libraries are used, I just downloaded a spark binary pre-built with a version of hadoop (2.6) and things started working fine.
I finally find a solution to remove the exception.
In spark-class2.cmd, add :
set HADOOP_CLASS1=%HADOOP_HOME%\share\hadoop\common\*
set HADOOP_CLASS2=%HADOOP_HOME%\share\hadoop\common\lib\*
set HADOOP_CLASS3=%HADOOP_HOME%\share\hadoop\mapreduce\*
set HADOOP_CLASS4=%HADOOP_HOME%\share\hadoop\mapreduce\lib\*
set HADOOP_CLASS5=%HADOOP_HOME%\share\hadoop\yarn\*
set HADOOP_CLASS6=%HADOOP_HOME%\share\hadoop\yarn\lib\*
set HADOOP_CLASS7=%HADOOP_HOME%\share\hadoop\hdfs\*
set HADOOP_CLASS8=%HADOOP_HOME%\share\hadoop\hdfs\lib\*
set CLASSPATH=%HADOOP_CLASS1%;%HADOOP_CLASS2%;%HADOOP_CLASS3%;%HADOOP_CLASS4%;%HADOOP_CLASS5%;%HADOOP_CLASS6%;%HADOOP_CLASS7%;%HADOOP_CLASS8%;%LAUNCH_CLASSPATH%
Then, change :
"%RUNNER%" -cp %CLASSPATH%;%LAUNCH_CLASSPATH% org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
to :
"%RUNNER%" -Dhadoop.home.dir=*hadoop-installation-folder* -cp %CLASSPATH% %JAVA_OPTS% %*
It works fine with me, but I'm not sure this is the best solution.