Exception while deleting Spark temp dir in Windows 7 64 bit

后端 未结 10 947
走了就别回头了
走了就别回头了 2021-02-12 23:43

I am trying to run unit test of spark job in windows 7 64 bit. I have

HADOOP_HOME=D:/winutils

winutils path= D:/winutils/bin/winutils.exe

I r

相关标签:
10条回答
  • 2021-02-13 00:24

    After following above suggestions, I made below changes -

    Update spark-defaults.conf or create a copy of spark-defaults.conf.template
    & rename it to spark-defaults.conf

    Add following line like - spark.local.dir=E:\spark2.4.6\tempDir via above line we are setting the temp folder for Spark to use.

    Similarly update log4j.properties in your spark setup like did above, with the below lines-

    log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF log4j.logger.org.apache.spark.SparkEnv=ERROR

    Now ShutdownHookManager will not be used during exit causing those error lines on console.

    Now how to clean the temp folder then?
    So for that add below lines in bin/spark-shell.cmd file -

    rmdir /q /s "E:/spark2.4.6/tempDir"
    del C:\Users\nitin\AppData\Local\Temp\jansi*.*

    By having above updates, I can see clean exit with temp folders clean-up also.

    0 讨论(0)
  • 2021-02-13 00:26

    My Hadoop environment on Windows 10:

    HADOOP_HOME=C:\hadoop
    

    Spark and Scala versions:

    Spark-2.3.1 and Scala-2.11.8
    

    Below is my spark-submit command:

    spark-submit --class SparkScalaTest --master local[*] D:\spark-projects\SparkScalaTest\target\scala-2.11\sparkscalatest_2.11-0.1.jar D:\HDFS\output
    

    Based on my Hadoop environment on Windows 10, I defined the following system properties in my Scala main class:

    System.setProperty("hadoop.home.dir", "C:\\hadoop\\")
    System.setProperty("hadoop.tmp.dir", "C:\\hadoop\\tmp")
    

    Result: I am getting the same error, but my outputs are getting generated in the output path D:\HDFS\output passed in spark-submit

    Hope this helps to bypass this error and get the expected result for Spark running locally on Windows.

    0 讨论(0)
  • 2021-02-13 00:28

    I have a workaround for this, instead of letting spark's ShutdownHookManager to delete the temporary directories you can issue windows commands to do that,

    Steps:

    1. Change the temp directory using spark.local.dir in spark-defaults.conf file

    2. Set log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF in log4j.properties file

    3. spark-shell internally calls spark-shell.cmd file. So add rmdir /q /s "your_dir\tmp"

    this should work!

    0 讨论(0)
  • 2021-02-13 00:32

    I'm facing the same problem after trying to run the WordCount example with spark-submit command. Right now, i'm ignoring it because it returns the results before the error happens.

    I found some old issues in spark Jira but didn't found any fixes. (BTW, one of them is with the status closed.)

    https://issues.apache.org/jira/browse/SPARK-8333

    https://issues.apache.org/jira/browse/SPARK-12216

    Unfortunately seems that they don't care about spark on windows at all.

    One bad solution is to give the Temp folder (in yout case *C:\Users\415387\AppData\Local\Temp*) permission to everyone.

    So it will be like that:

    winutils chmod -R 777 C:\Users\415387\AppData\Local\Temp\
    

    But I strongly recomend you to not do that.

    0 讨论(0)
  • 2021-02-13 00:36

    I've set the HADOOP_HOME variable in the same way as you have. (On Windows 10)

    Try using the complete path when setting permissions i.e.

    D:> winutils/bin/winutils.exe chmod 777 \tmp\hive

    This worked for me.

    Also, just a note on the exception - I'm getting the same exception on exiting spark from cmd by running "sys.exit".

    But... I can exit cleanly when I use ":q" or ":quit". So, not sure what's happening here, still trying to figure out...

    0 讨论(0)
  • 2021-02-13 00:36

    The issue is in the ShutdownHook that tries to delete the temp files but fails. Though you cannot solve the issue, you can simply hide the exceptions by adding the following 2 lines to your log4j.properties file in %SPARK_HOME%\conf. If the file does not exist, copy the log4j.properties.template and rename it.

    log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
    log4j.logger.org.apache.spark.SparkEnv=ERROR
    

    Out of sight is out of mind.

    0 讨论(0)
提交回复
热议问题