发表新帖

发表新帖

Spark Failure : Caused by: org.apache.spark.shuffle.FetchFailedException: Too large frame: 5454002341

后端未结

关注

 5  1125

遥遥无期 2021-01-06 07:51

I am generating a hierarchy for a table determining the parent child.

Below is the configuration used, even after getting the error with regards to the too large fra

5条回答

清酒与你 (楼主)

2021-01-06 08:32
Got the exact same error when trying to Backfill a few years of Data. Turns out, its because your partitions are of size > 2gb.
1. You can either Bump up the number of partitions (using repartition()) so that your partitions are under 2GB. (Keep your partitions close to 128mb to 256mb i.e. close to the HDFS Block size)
2. Or you can bump up the shuffle limit to > 2GB as mentioned above. (Avoid it). Also, partitions with large amount of data will result in tasks that take a long time to finish.
Note: repartition(n) will result in n part files per partition during write to s3/hdfs.

Read this for more info: http://www.russellspitzer.com/2018/05/10/SparkPartitions/
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题