How to submit multiple Spark applications in parallel without spawning separate JVMs?

后端 未结 2 790
一生所求
一生所求 2021-02-09 08:20

The problem is that you need to launch separate JVM to create separate session with different number of RAM per job.

How to submit few Spark applications simultaneously

2条回答
  •  甜味超标
    2021-02-09 09:07

    With a use case, this is much clearer now. There are two possible solutions:

    If you require shared data between those jobs, use the FAIR-scheduler and a (REST-)frontend (as does SparkJobServer, Livy, etc.). You don't need to use SparkJobServer either, it should be relatively easy to code, if you have a fixed scope. I've seen projects go in that direction. All you need is an event loop and a way to translate your incoming queries into Spark queries. In a way, I would expect there to be demand for a library to cover this use case, since it's pretty much always the first thing you have to build, when you work on a Spark-based application/framework. In this case, you can size your executors according to your hardware, Spark will manage scheduling of your jobs. With Yarn's dynamic resource allocation, Yarn will also free resources (kill executors), should your framework/app be idle. For more information, read here: http://spark.apache.org/docs/latest/job-scheduling.html

    If you don't need shared data, use YARN (or another resource manager) to assign resources in a fair manner to both jobs. YARN has a fair scheduling mode, and you can set the resource demands per application. If you think this suits you, but you need shared data, then you might want to think about using Hive or Alluxio to provide a data interface. In this case you would run two spark-submits, and maintain multiple drivers in the cluster. Building additional automation around spark-submit can help you make this less annoying and more transparent to end users. This approach is also high-latency, since resource allocation and SparkSession initialization take up a more or less constant amount of time.

提交回复
热议问题