问题
We just migrated to Databricks Delta from parquet using Hive metastore. So far everything seems to
work fine, when I try to print out the location of the new Delta table using DESCRIBE EXTENDED my_table
the location is correct although it is different than the one found in the hiveMetastore
database. When I access the hiveMetastore
database I can successfully identify the target table (also provider is correctly set to Delta). To retrieve the previous information I am executing a join between sds
, dbs
, tbls
and table_params
tables from hiveMetastore db
, filtering by table name as shown next:
val sdsDF = spark.read
.format("jdbc")
.option("url", activeConnection.url)
.option("dbtable", "hiveMetastore.SDS")
.option("user", activeConnection.user)
.option("password", activeConnection.pwd)
.load()
val tblsDf = spark.read
.format("jdbc")
.option("url", activeConnection.url)
.option("dbtable", "hiveMetastore.TBLS")
.option("user", activeConnection.user)
.option("password", activeConnection.pwd)
.load()
val dbsDf = spark.read
.format("jdbc")
.option("url", activeConnection.url)
.option("dbtable", "hiveMetastore.DBS")
.option("user", activeConnection.user)
.option("password", activeConnection.pwd)
.load()
val paramsDf = spark.read
.format("jdbc")
.option("url", activeConnection.url)
.option("dbtable", "hiveMetastore.TABLE_PARAMS")
.option("user", activeConnection.user)
.option("password", activeConnection.pwd)
.load()
val resDf = sdsDF.join(tblsDf, "SD_ID")
.join(dbsDf, "DB_ID")
.join(paramsDf, "TBL_ID")
.where('TBL_NAME.rlike("mytable"))
.select($"TBL_NAME", $"TBL_TYPE", $"NAME".as("DB_NAME"), $"DB_LOCATION_URI", $"LOCATION".as("TABLE_LOCATION"), $"PARAM_KEY", $"PARAM_VALUE")
All the previous are executed from a databricks notebook.
My question is why I am getting two different locations even if the table name is the same? Where is the correct location for the Delta tables stored if not on hiveMetastore db?
来源:https://stackoverflow.com/questions/60614701/where-is-the-delta-table-location-stored