问题
I have an instance of org.apache.spark.rdd.RDD[MyClass]. How can I programmatically check if the instance is persist\inmemory?
回答1:
You want RDD.getStorageLevel
. It will return StorageLevel.None
if empty. However that is only if it is marked for caching or not. If you want the actual status you can use the developer api sc.getRDDStorageInfo
or sc.getPersistentRDD
回答2:
You can call rdd.getStorageLevel.useMemory to check if it is in memory or not as follows:
scala> myrdd.getStorageLevel.useMemory
res3: Boolean = false
scala> myrdd.cache()
res4: myrdd.type = MapPartitionsRDD[2] at filter at <console>:29
scala> myrdd.getStorageLevel.useMemory
res5: Boolean = true
来源:https://stackoverflow.com/questions/30688145/how-to-check-if-spark-rdd-is-in-memory