What happens if I cache the same RDD twice in Spark

后端 未结 2 586
感动是毒
感动是毒 2021-01-05 16:40

I\'m building a generic function which receives a RDD and does some calculations on it. Since I run more than one calculation on the input RDD I would like to cache it. For

2条回答
  •  执念已碎
    2021-01-05 17:21

    Nothing. If you call cache on a cached RDD, nothing happens, RDD will be cached (once). Caching, like many other transformations, is lazy:

    • When you call cache, the RDD's storageLevel is set to MEMORY_ONLY
    • When you call cache again, it's set to the same value (no change)
    • Upon evaluation, when underlying RDD is materialized, Spark will check the RDD's storageLevel and if it requires caching, it will cache it.

    So you're safe.

提交回复
热议问题