Multiple Spark applications with HiveContext

前端 未结 1 1796
一个人的身影
一个人的身影 2020-11-29 12:30

Having two separate pyspark applications that instantiate a HiveContext in place of a SQLContext lets one of the two applications fail with the err

相关标签:
1条回答
  • 2020-11-29 13:09

    By default Hive(Context) is using embedded Derby as a metastore. It is intended mostly for testing and supports only one active user. If you want to support multiple running applications you should configure a standalone metastore. At this moment Hive supports PostgreSQL, MySQL, Oracle and MySQL. Details of configuration depend on a backend and option (local / remote) but generally speaking you'll need:

    • a running RDBMS server
    • a metastore database created using provided scripts
    • a proper Hive configuration

    Cloudera provides a comprehensive guide you may find useful: Configuring the Hive Metastore.

    Theoretically it should be also possible to create separate Derby metastores with a proper configuration (see Hive Admin Manual - Local/Embedded Metastore Database) or to use Derby in Server Mode.

    For development you can start applications in different working directories. This will create separate metastore_db for each application and avoid the issue of multiple active users. Providing separate Hive configuration should work as well but is less useful in development:

    When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory

    0 讨论(0)
提交回复
热议问题