发表新帖

发表新帖

What is shuffle read & shuffle write in Apache Spark

后端未结

关注

 2  1697

心在旅途 2021-02-03 23:07

In below screenshot of Spark admin running on port 8080 :

$\"enter$

The \"Shuffle R

2条回答

慢半拍i (楼主)

2021-02-03 23:27

Shuffling means the reallocation of data between multiple Spark stages. "Shuffle Write" is the sum of all written serialized data on all executors before transmitting (normally at the end of a stage) and "Shuffle Read" means the sum of read serialized data on all executors at the beginning of a stage.

Your programm has only one stage, triggered by the "collect" operation. No shuffling is required, because you have only a bunch of consecutive map operations which are pipelined in one Stage.

Try to take a look at these slides: http://de.slideshare.net/colorant/spark-shuffle-introduction

It could also help to read chapture 5 from the original paper: http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题