问题
Spark job launched in Dataproc cluster fails with below exception. I have tried with various cluster configs but the result is same. I am getting this error in Dataproc image 1.2.
Note: There are no preemptive workers also there is sufficient space in the disks. However I have noticed that there is no /hadoop/yarn/nm-local-dir/usercache/root
folder at all in worker nodes. But I can see a folder named dr.who
.
java.io.IOException: Failed to create local dir in /hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1534256335401_0001/blockmgr-89931abb-470c-4eb2-95a3-8f8bfe5334d7/2f.
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:80)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getDataFile(IndexShuffleBlockResolver.scala:54)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
possible duplicate of : Spark on Google's Dataproc failed due to java.io.FileNotFoundException: /hadoop/yarn/nm-local-dir/usercache/root/appcache/
回答1:
I could resolve the issue by using Dataproc 1.3. However 1.3 does not come with bigquery connector which needs to be handled . https://cloud.google.com/dataproc/docs/concepts/connectors/bigquery
来源:https://stackoverflow.com/questions/51847291/spark-on-dataproc-fails-with-java-io-filenotfoundexception