Prebuilt Spark 2.1.0 creates metastore_db folder and derby.log when launching spark-shell

≡放荡痞女 提交于 2019-12-07 01:56:27

From Spark 2.1.0 documentation pages:

When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.

Since you do not have Hive installed, you will not have a hive-site.xml config file, and this must be defaulting to the current directory.

If you are not planning to use HiveContext in Spark, you could reinstall Spark 2.1.0 from source, rebuilding it with Maven and making sure you omit -Phive -Phive-thriftserver flags which enable Hive support.

hiryu

For future googlers: the actual underlying reason for the creation of metastore_db and derby.log in every working directory is the default value of derby.system.home.

This can be changed in spark-defaults.conf, see here.

This happen also with Spark 1.6. You can change the path by adding in Spark submit extra options:

-Dderby.system.home=/tmp/derby

(or by derby.properties, there are several ways to change it).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!