发表新帖

发表新帖

Spark out of memory

前端未结

关注

 3  1505

面向向阳花 2021-02-04 12:41

I have a folder with 150 G of txt files (around 700 files, on average each 200 MB).

I\'m using scala to process the files and calculate some aggregate statistics in the

3条回答

别跟我提以往 (楼主)

2021-02-04 12:57
To add another perspective based on code (as opposed to configuration): Sometimes it's best to figure out at what stage your Spark application is exceeding memory, and to see if you can make changes to fix the problem. When I was learning Spark, I had a Python Spark application that crashed with OOM errors. The reason was because I was collecting all the results back in the master rather than letting the tasks save the output.

E.g.
```
for item in processed_data.collect():
   print(item)
```
- failed with OOM errors. On the other hand,
processed_data.saveAsTextFile(output_dir)
- worked fine.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题