Does spark automatically cache some results?

后端未结

关注

 1  1863

I run an action two times, and the second time takes very little time to run, so I suspect that spark automatically cache some results. But I did find any source.

相关标签:

1条回答

半阙折子戏

2021-01-13 16:22

From the documentation:

Spark also automatically persists some intermediate data in shuffle operations (e.g. reduceByKey), even without users calling persist. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call persist on the resulting RDD if they plan to reuse it.

The underlying filesystem will also be caching access to the disk.

0 讨论(0)
发布评论:

提交评论
- 加载中...