Spark out of memory

前端 未结 3 1505
面向向阳花
面向向阳花 2021-02-04 12:41

I have a folder with 150 G of txt files (around 700 files, on average each 200 MB).

I\'m using scala to process the files and calculate some aggregate statistics in the

3条回答
  •  别跟我提以往
    2021-02-04 12:57

    To add another perspective based on code (as opposed to configuration): Sometimes it's best to figure out at what stage your Spark application is exceeding memory, and to see if you can make changes to fix the problem. When I was learning Spark, I had a Python Spark application that crashed with OOM errors. The reason was because I was collecting all the results back in the master rather than letting the tasks save the output.

    E.g.

    for item in processed_data.collect():
       print(item)
    
    • failed with OOM errors. On the other hand,

    processed_data.saveAsTextFile(output_dir)

    • worked fine.

提交回复
热议问题