Why does Spark job fails to write output?

后端未结

关注

 2  1284

Setup:

I have a Spark job running on a distributed Spark Cluster with 10 nodes. I am doing some text file processing on HDFS. The job runs fine, until the last ste

相关标签:

2条回答

深忆病人

2021-01-12 14:54
I met the same problem before, then realize if you choose running on standalone, then the driver will be run by user, and executor processes are run by root. The only change you need is:

Firstly, sbt package to create jar file, notice that you may better run sbt package by user not by root. I have tried to sbt package by root (sudo), then the assembly jar file will be created in somewhere else.

After you have a assembly jar file, then doing spark submit by "sudo".
```
sudo /opt/spark-2.0/bin/spark-submit \
   --class ...
   --master ..
   ...
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2021-01-12 15:08

I had same issue, it turned out that my Spark worker was running as root user and my job was running as another user, so when calling saveAsTextFile, Spark worker first save the data to a temporary location on disk as root user, then the Spark job, which was running as different user, tries to move the temporary data owned by root to a final location, will have a permission issue.

0 讨论(0)
发布评论:

提交评论
- 加载中...