问题
For spark's RDD object this is quite trivial as it exposes a getStorageLevel method, but DF does not seem to expose anything similar. anyone?
回答1:
You can check weather a DataFrame is cached or not using Catalog (org.apache.spark.sql.catalog.Catalog)
which comes in Spark 2.
Code example :
val sparkSession = SparkSession.builder.
master("local")
.appName("example")
.getOrCreate()
val df = sparkSession.read.csv("src/main/resources/sales.csv")
df.createTempView("sales")
//interacting with catalog
val catalog = sparkSession.catalog
//print the databases
catalog.listDatabases().select("name").show()
// print all the tables
catalog.listTables().select("name").show()
// is cached
println(catalog.isCached("sales"))
df.cache()
println(catalog.isCached("sales"))
Using the above code you can list all the tables and check weather a table is cached or not.
You can check the working code example here
来源:https://stackoverflow.com/questions/41238986/how-to-check-if-a-dataframe-was-already-cached-persisted-before