发表新帖

发表新帖

Apache Spark YARN mode startup takes too long (10+ secs)

前端未结

关注

 4  1226

小蘑菇 2021-02-05 13:00

I’m running a spark application with YARN-client or YARN-cluster mode.

But it seems to take too long to startup.

It takes 10+ seconds to initialize the spark con

4条回答

夕颜 (楼主)

2021-02-05 13:29
For the fast creation of Spark-Context

Tested on EMR:
1. cd /usr/lib/spark/jars/; zip /tmp/yarn-archive.zip *.jar
2. cd path/to/folder/of/someOtherDependancy/jarFolder/; zip /tmp/yarn-archive.zip jar-file.jar
3. zip -Tv /tmp/yarn-archive.zip for Test integrity and Verbose debug
4. if yarn-archive.zip already exists on hdfs then hdfs dfs -rm -r -f -skipTrash /user/hadoop/yarn-archive.zip hdfs dfs -put /tmp/yarn-archive.zip /user/hadoop/ else hdfs dfs -put /tmp/yarn-archive.zip /user/hadoop/
5. --conf spark.yarn.archive="hdfs:///user/hadoop/yarn-archive.zip" use this argument in spark-submit
The reason why this can work is, the master does not have to distribute all the jars to the slaves. It is available to them from some common hdfs path here it is hdfs:///user/hadoop/yarn-archive.zip.

I realized that it can save your time by 3-5 seconds, this time also depends on the number of nodes in the cluster. More the nodes, more you save the time.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题