I am trying to run unit test of spark job in windows 7 64 bit. I have
HADOOP_HOME=D:/winutils
winutils path= D:/winutils/bin/winutils.exe
I r
After following above suggestions, I made below changes -
Update spark-defaults.conf or create a copy of spark-defaults.conf.template
& rename it to spark-defaults.conf
Add following line like - spark.local.dir=E:\spark2.4.6\tempDir via above line we are setting the temp folder for Spark to use.
Similarly update log4j.properties in your spark setup like did above, with the below lines-
log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF log4j.logger.org.apache.spark.SparkEnv=ERROR
Now ShutdownHookManager will not be used during exit causing those error lines on console.
Now how to clean the temp folder then?
So for that add below lines in bin/spark-shell.cmd file -
rmdir /q /s "E:/spark2.4.6/tempDir"
del C:\Users\nitin\AppData\Local\Temp\jansi*.*
By having above updates, I can see clean exit with temp folders clean-up also.
My Hadoop environment on Windows 10:
HADOOP_HOME=C:\hadoop
Spark and Scala versions:
Spark-2.3.1 and Scala-2.11.8
Below is my spark-submit command:
spark-submit --class SparkScalaTest --master local[*] D:\spark-projects\SparkScalaTest\target\scala-2.11\sparkscalatest_2.11-0.1.jar D:\HDFS\output
Based on my Hadoop environment on Windows 10, I defined the following system properties in my Scala main class:
System.setProperty("hadoop.home.dir", "C:\\hadoop\\")
System.setProperty("hadoop.tmp.dir", "C:\\hadoop\\tmp")
Result: I am getting the same error, but my outputs are getting generated in the output path D:\HDFS\output passed in spark-submit
Hope this helps to bypass this error and get the expected result for Spark running locally on Windows.
I have a workaround for this, instead of letting spark's ShutdownHookManager
to delete the temporary directories you can issue windows commands to do that,
Steps:
Change the temp directory using spark.local.dir
in spark-defaults.conf
file
Set log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
in log4j.properties
file
spark-shell
internally calls spark-shell.cmd
file. So add rmdir /q /s "your_dir\tmp"
this should work!
I'm facing the same problem after trying to run the WordCount example with spark-submit command. Right now, i'm ignoring it because it returns the results before the error happens.
I found some old issues in spark Jira but didn't found any fixes. (BTW, one of them is with the status closed.)
https://issues.apache.org/jira/browse/SPARK-8333
https://issues.apache.org/jira/browse/SPARK-12216
Unfortunately seems that they don't care about spark on windows at all.
One bad solution is to give the Temp folder (in yout case *C:\Users\415387\AppData\Local\Temp*) permission to everyone.
So it will be like that:
winutils chmod -R 777 C:\Users\415387\AppData\Local\Temp\
But I strongly recomend you to not do that.
I've set the HADOOP_HOME variable in the same way as you have. (On Windows 10)
Try using the complete path when setting permissions i.e.
D:> winutils/bin/winutils.exe chmod 777 \tmp\hive
This worked for me.
Also, just a note on the exception - I'm getting the same exception on exiting spark from cmd by running "sys.exit".
But... I can exit cleanly when I use ":q" or ":quit". So, not sure what's happening here, still trying to figure out...
The issue is in the ShutdownHook that tries to delete the temp files but fails. Though you cannot solve the issue, you can simply hide the exceptions by adding the following 2 lines to your log4j.properties
file in %SPARK_HOME%\conf
. If the file does not exist, copy the log4j.properties.template
and rename it.
log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.SparkEnv=ERROR
Out of sight is out of mind.