I just upgraded from Spark 2.0.2 to Spark 2.1.0 (by downloading the prebuilt version for Hadoop 2.7&later). No Hive is installed.
Upon launch of the spark-shell, the metastore_db/
folder and derby.log
file are created at the launch location, together with a bunch of warning logs (which were not printed in the previous version).
Closer inspection of the debug logs shows that Spark 2.1.0 tries to initialise a HiveMetastoreConnection
:
17/01/13 09:14:44 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
Similar debug logs for Spark 2.0.2 do not show any initialisation of HiveMetastoreConnection
.
Is this intended behaviour? Could it be related to the fact that spark.sql.warehouse.dir
is now a static configuration shared among sessions? How do I avoid this, since I have no Hive installed?
Thanks in advance!
From Spark 2.1.0 documentation pages:
When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.
Since you do not have Hive installed, you will not have a hive-site.xml config file, and this must be defaulting to the current directory.
If you are not planning to use HiveContext
in Spark, you could reinstall Spark 2.1.0 from source, rebuilding it with Maven and making sure you omit -Phive -Phive-thriftserver
flags which enable Hive support.
For future googlers: the actual underlying reason for the creation of metastore_db
and derby.log
in every working directory is the default value of derby.system.home
.
This can be changed in spark-defaults.conf
, see here.
This happen also with Spark 1.6. You can change the path by adding in Spark submit extra options:
-Dderby.system.home=/tmp/derby
(or by derby.properties, there are several ways to change it).
来源:https://stackoverflow.com/questions/41633084/prebuilt-spark-2-1-0-creates-metastore-db-folder-and-derby-log-when-launching-sp