How to deal with executor memory and driver memory in Spark?

后端 未结 2 1566
旧时难觅i
旧时难觅i 2021-01-30 02:17

I am confused about dealing with executor memory and driver memory in Spark.

My environment settings are as below:

  • Memory 128 G, 16 CPU for 9 VM
  • C
相关标签:
2条回答
  • 2021-01-30 02:47

    In a Spark Application, Driver is responsible for task scheduling and Executor is responsible for executing the concrete tasks in your job.

    If you are familiar with MapReduce, your map tasks & reduce tasks are all executed in Executor(in Spark, they are called ShuffleMapTasks & ResultTasks), and also, whatever RDD you want to cache is also in executor's JVM's heap & disk.

    So I think a few GBs will just be OK for your Driver.

    0 讨论(0)
  • 2021-01-30 02:55

    The memory you need to assign to the driver depends on the job.

    If the job is based purely on transformations and terminates on some distributed output action like rdd.saveAsTextFile, rdd.saveToCassandra, ... then the memory needs of the driver will be very low. Few 100's of MB will do. The driver is also responsible of delivering files and collecting metrics, but not be involved in data processing.

    If the job requires the driver to participate in the computation, like e.g. some ML algo that needs to materialize results and broadcast them on the next iteration, then your job becomes dependent of the amount of data passing through the driver. Operations like .collect,.take and takeSample deliver data to the driver and hence, the driver needs enough memory to allocate such data.

    e.g. If you have an rdd of 3GB in the cluster and call val myresultArray = rdd.collect, then you will need 3GB of memory in the driver to hold that data plus some extra room for the functions mentioned in the first paragraph.

    0 讨论(0)
提交回复
热议问题