Apache Spark YARN mode startup takes too long (10+ secs)

前端未结

关注

 4  1225

小蘑菇

I’m running a spark application with YARN-client or YARN-cluster mode.

But it seems to take too long to startup.

It takes 10+ seconds to initialize the spark con

相关标签:

4条回答

夕颜

2021-02-05 13:29
For the fast creation of Spark-Context

Tested on EMR:
1. cd /usr/lib/spark/jars/; zip /tmp/yarn-archive.zip *.jar
2. cd path/to/folder/of/someOtherDependancy/jarFolder/; zip /tmp/yarn-archive.zip jar-file.jar
3. zip -Tv /tmp/yarn-archive.zip for Test integrity and Verbose debug
4. if yarn-archive.zip already exists on hdfs then hdfs dfs -rm -r -f -skipTrash /user/hadoop/yarn-archive.zip hdfs dfs -put /tmp/yarn-archive.zip /user/hadoop/ else hdfs dfs -put /tmp/yarn-archive.zip /user/hadoop/
5. --conf spark.yarn.archive="hdfs:///user/hadoop/yarn-archive.zip" use this argument in spark-submit
The reason why this can work is, the master does not have to distribute all the jars to the slaves. It is available to them from some common hdfs path here it is hdfs:///user/hadoop/yarn-archive.zip.

I realized that it can save your time by 3-5 seconds, this time also depends on the number of nodes in the cluster. More the nodes, more you save the time.
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2021-02-05 13:29

If you're using mac os to run some tasks in standalone mode,

Just remember to enable the remote ssh connection in your system preference -> sharing (I don't know the reason why we should do that)

Before enabling it, it takes me about 1 minute to run spark-submit xx.py

After enabling it, it only takes me 3 seconds to run it.

I hope it can help others has the kind of issues on Mac os platform

0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-02-05 13:32
You could check Apache Livy which is a REST API in front of Spark.
- http://livy.io/
- https://github.com/cloudera/livy
You could have one session and multiple requests to that one Spark/Livy session.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2021-02-05 13:33

This is pretty typical. My system takes about 20 seconds from running spark-submit until getting a SparkContext.

As it says in the docs in a couple of places, the solution is to turn your driver into an RPC server. That way you initialize once, and then other applications can use the driver's context as a service.

I am in the middle of doing this with my application. I am using http4s and turning my driver into a web server.

0 讨论(0)
发布评论:

提交评论
- 加载中...

Apache Spark YARN mode startup takes too long (10+ secs)

For the fast creation of Spark-Context