I was trying to run spark-submit and I get \"Failed to find Spark assembly JAR. You need to build Spark before running this program.\" When I try to run spark-shell
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.
If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.
Go to SPARK_HOME
. Note that your SPARK_HOME variable should not include /bin
at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH
Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
to allot more memory to maven.
Run ./build/mvn -DskipTests clean package
and be patient. It took my system 1 hour and 17 minutes to finish this.
Run ./dev/make-distribution.sh --name custom-spark --pip
. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark
or spark-shell
will now start pyspark and spark respectively.