发表新帖

发表新帖

Out of memory error when collecting data out of Spark cluster

前端未结

关注

 2  1749

臣服心动 2021-02-05 11:25

I know there are plenty of questions on SO about out of memory errors on Spark but I haven\'t found a solution to mine.

I have a simple workflow:

read in O

2条回答

说谎 (楼主)

2021-02-05 11:52
When you say collect on the dataframe there are 2 things happening,
1. First is all the data has to be written to the output on the driver.
2. The driver has to collect the data from all nodes and keep in its memory.
Answer:

If you are looking to just load the data into memory of the exceutors, count() is also an action that will load the data into the executor's memory which can be used by other processes.

If you want to extract the data, then try this along with other properties when puling the data "--conf spark.driver.maxResultSize=10g".
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题