Does spark automatically cache some results?

后端 未结 1 1862
渐次进展
渐次进展 2021-01-13 15:36

I run an action two times, and the second time takes very little time to run, so I suspect that spark automatically cache some results. But I did find any source.

Im

相关标签:
1条回答
  • 2021-01-13 16:22

    From the documentation:

    Spark also automatically persists some intermediate data in shuffle operations (e.g. reduceByKey), even without users calling persist. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call persist on the resulting RDD if they plan to reuse it.

    The underlying filesystem will also be caching access to the disk.

    0 讨论(0)
提交回复
热议问题