Setup Standalone Hive Metastore Service For Presto and AWS S3

前端 未结 4 517
花落未央
花落未央 2021-02-04 11:07

I\'m working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I\'m trying to setup Presto to be able to query the data in S3 and I kno

相关标签:
4条回答
  • 2021-02-04 11:37

    It's now available standalone /hive-standalone-metastore-3.0.0/ in the Apache Hive distribution.

    Beginning in Hive 3.0, the Metastore is released as a separate package and can be run without the rest of Hive. This is referred to as standalone mode.

    By default the Metastore is configured for use with Hive, so a few configuration parameters have to be changed in this configuration.

    metastore.task.threads.always -> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
    metastore.expression.proxy -> org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy
    

    Link to Docs

    0 讨论(0)
  • 2021-02-04 11:39

    needing to set up hive just for the metastore seems cumbersome indeed. Have you considered using the AWS glue data catalog instead? This way you won’t have to manage anything. You can find detailed informations here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html

    0 讨论(0)
  • 2021-02-04 11:41

    I was able to integrate with AWS S3 using Presto SQL amd HMS 3.0. I did a writeup if it helps. https://www.linkedin.com/pulse/presto-sql-s3-abhishek-gupta

    0 讨论(0)
  • 2021-02-04 11:49

    There is a workaround, that you do not need hive to run presto. However I haven't tried that with any distributed file system like s3, but code suggest it should work (at least with HDFS). In my opinion it is worth trying, because you do not need any new docker image for hive at all.

    The idea is to use a builtin FileHiveMetastore. It is neither documented nor advised to be used in production but you could play with it. Schema information is stored next to the data in the file system. Obviously, it has its prons and cons. I do not know the details of your use case, so I don't know if it fits your needs.

    Configuration:

    connector.name=hive-hadoop2
    hive.metastore=file
    hive.metastore.catalog.dir=file:///tmp/hive_catalog
    hive.metastore.user=cox
    

    Demo:

    presto:tiny> create schema hive.default;
    CREATE SCHEMA
    presto:tiny> use hive.default;
    USE
    presto:default> create table t (t bigint);
    CREATE TABLE
    presto:default> show tables;
     Table
    -------
     t
    (1 row)
    
    Query 20180223_202609_00009_iuchi, FINISHED, 1 node
    Splits: 18 total, 18 done (100.00%)
    0:00 [1 rows, 18B] [11 rows/s, 201B/s]
    
    presto:default> insert into t (values 1);
    INSERT: 1 row
    
    Query 20180223_202616_00010_iuchi, FINISHED, 1 node
    Splits: 51 total, 51 done (100.00%)
    0:00 [0 rows, 0B] [0 rows/s, 0B/s]
    
    presto:default> select * from t;
     t
    ---
     1
    (1 row)
    

    After the above I was able to find the following on my machine:

    /tmp/hive_catalog/
    /tmp/hive_catalog/default
    /tmp/hive_catalog/default/t
    /tmp/hive_catalog/default/t/.prestoPermissions
    /tmp/hive_catalog/default/t/.prestoPermissions/user_cox
    /tmp/hive_catalog/default/t/.prestoPermissions/.user_cox.crc
    /tmp/hive_catalog/default/t/.20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278.crc
    /tmp/hive_catalog/default/t/.prestoSchema
    /tmp/hive_catalog/default/t/20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278
    /tmp/hive_catalog/default/t/..prestoSchema.crc
    /tmp/hive_catalog/default/.prestoSchema
    /tmp/hive_catalog/default/..prestoSchema.crc
    
    0 讨论(0)
提交回复
热议问题