I\'m building a generic function which receives a RDD and does some calculations on it. Since I run more than one calculation on the input RDD I would like to cache it. For
Nothing. If you call cache
on a cached RDD, nothing happens, RDD will be cached (once). Caching, like many other transformations, is lazy:
cache
, the RDD's storageLevel
is set to MEMORY_ONLY
cache
again, it's set to the same value (no change)storageLevel
and if it requires caching, it will cache it. So you're safe.