发表新帖

发表新帖

What is shuffle read & shuffle write in Apache Spark

后端未结

关注

 2  1696

In below screenshot of Spark admin running on port 8080 :

$\"enter$

The \"Shuffle R

相关标签:

2条回答

清歌不尽

2021-02-03 23:24

I believe you have to run your application in cluster/distributed mode to see any Shuffle read or write values. Typically "shuffle" are triggered by a subset of Spark actions (e.g., groupBy, join, etc)

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2021-02-03 23:27

Shuffling means the reallocation of data between multiple Spark stages. "Shuffle Write" is the sum of all written serialized data on all executors before transmitting (normally at the end of a stage) and "Shuffle Read" means the sum of read serialized data on all executors at the beginning of a stage.

Your programm has only one stage, triggered by the "collect" operation. No shuffling is required, because you have only a bunch of consecutive map operations which are pipelined in one Stage.

Try to take a look at these slides: http://de.slideshare.net/colorant/spark-shuffle-introduction

It could also help to read chapture 5 from the original paper: http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题