Spark Failure : Caused by: org.apache.spark.shuffle.FetchFailedException: Too large frame: 5454002341

后端未结

关注

 5  1128

遥遥无期

I am generating a hierarchy for a table determining the parent child.

Below is the configuration used, even after getting the error with regards to the too large fra

相关标签:

5条回答

情歌与酒

2021-01-06 08:16
I was experiencing the same issue while I was working on a ~ 700GB dataset. Decreasing spark.maxRemoteBlockSizeFetchToMem didn't help in my case. In addition, I wasn't able to increase the amount of partitions.

Doing the following worked for me:
1. Increasing spark.network.timeout (default value is 120 seconds in Spark 2.3) which is affecting the following:
```
spark.core.connection.ack.wait.timeout
spark.storage.blockManagerSlaveTimeoutMs
spark.shuffle.io.connectionTimeout
spark.rpc.askTimeout
spark.rpc.lookupTimeout
```
1. Setting spark.network.timeout=600s (default is 120s in Spark 2.3)
2. Setting spark.io.compression.lz4.blockSize=512k (default is 32k in Spark 2.3)
3. Setting spark.shuffle.file.buffer=1024k(default is 32k in Spark 2.3)
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2021-01-06 08:31
Suresh is right. Here's a better documented & formatted version of his answer with some useful background info:
- bug report (link to the fix is at the very bottom)
- fix (fixed as of 2.2.0 - already mentioned by Jared)
- change of config's default value (changed as of 2.4.0)
If you're on a version 2.2.x or 2.3.x, you can achieve the same effect by setting the value of the config to Int.MaxValue - 512, i.e. by setting spark.maxRemoteBlockSizeFetchToMem=2147483135. See here for the default value used as of September 2019.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2021-01-06 08:32
Got the exact same error when trying to Backfill a few years of Data. Turns out, its because your partitions are of size > 2gb.
1. You can either Bump up the number of partitions (using repartition()) so that your partitions are under 2GB. (Keep your partitions close to 128mb to 256mb i.e. close to the HDFS Block size)
2. Or you can bump up the shuffle limit to > 2GB as mentioned above. (Avoid it). Also, partitions with large amount of data will result in tasks that take a long time to finish.
Note: repartition(n) will result in n part files per partition during write to s3/hdfs.

Read this for more info: http://www.russellspitzer.com/2018/05/10/SparkPartitions/
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-01-06 08:38

use this spark config, spark.maxRemoteBlockSizeFetchToMem < 2g

Since there is lot of issues with> 2G partition (cannot shuffle, cannot cache on disk), Hence it is throwing failedfetchedexception too large data frame.

0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2021-01-06 08:38
This means that size of your dataset partitions is enormous. You need to repartition your dataset to more partitions.

you can do this using,
```
df.repartition(n)
```
Here, n is dependent on the size of your dataset.
0 讨论(0)
发布评论:

提交评论
- 加载中...