hive-metastore | 易学教程

Setup Standalone Hive Metastore Service For Presto and AWS S3

阅读更多关于 Setup Standalone Hive Metastore Service For Presto and AWS S3

问题 I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive

Setup Standalone Hive Metastore Service For Presto and AWS S3

阅读更多关于 Setup Standalone Hive Metastore Service For Presto and AWS S3

How to pass multiple column in partitionby method in Spark

阅读更多关于 How to pass multiple column in partitionby method in Spark

问题 I am a newbie in Spark.I want to write the dataframe data into hive table. Hive table is partitioned on mutliple column. Through, Hivemetastore client I am getting the partition column and passing that as a variable in partitionby clause in write method of dataframe. var1="country","state" (Getting the partiton column names of hive table) dataframe1.write.partitionBy(s"$var1").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/") When I am executing the above code,it is giving me

Cant save table to hive metastore, HDP 3.0

阅读更多关于 Cant save table to hive metastore, HDP 3.0

问题 I cant save a table to hive database anymore using metastore. I see the tables in spark using spark.sql but I cant see the same tables in hive database. I tried this but it doesnt store the table to hive. How can I configure the hive metastore? The spark version is 2.3.1. If you want more details please comment. %spark import org.apache.spark.sql.SparkSession val spark = (SparkSession .builder .appName("interfacing spark sql to hive metastore without configuration file") .config("hive

Can we predict the order of the results of a Hive SELECT * query?

阅读更多关于 Can we predict the order of the results of a Hive SELECT * query?

问题 Is it possible that the order of the results of a SELECT * query (no ORDER BY) is always the same provided that the same DBMS is used as Metastore? So, as long as MySQL is used as Metastore, the order of the results for a SELECT *; query will always be the same. If Postgres is used, the order will be always the same on the same data, but different from when MySQL is used. I am talking about the same data. Maybe it all boils down to the question of what is the default order of results and why

How to handle potential data loss when performing comparisons across data types in different groups

阅读更多关于 How to handle potential data loss when performing comparisons across data types in different groups

问题 Background: Our group is going through a Cloudera upgrade to 6.1.1 and I have been tasked with determining how to handle the loss of the implicit data type conversion across data types. See link below for the relevant Release Note details. https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_611_incompatible_changes.html#hive_union_all_returns_incorrect_data Not only does this issue affect UNION ALL queries, but there is a function that performs comparisons on

HiveMetaStoreClient fails to connect to a Kerberized cluster

阅读更多关于 HiveMetaStoreClient fails to connect to a Kerberized cluster

问题 Kerberized HDP-2.6.3.0. I have a test code running on my local Windows 7 machine. Note the commented code, as well, that's not making a difference. private static void connectHiveMetastore() throws MetaException, MalformedURLException { System.setProperty("hadoop.home.dir", "E:\\Development\\Software\\Virtualization"); /*Start : Commented or un-commented, immaterial ...*/ System.setProperty("javax.security.auth.useSubjectCredsOnly","false"); System.setProperty("java.security.auth.login.config

How to compare two columns with different data type groups

阅读更多关于 How to compare two columns with different data type groups

问题 This is an extension of a question I posed yesterday: How to handle potential data loss when performing comparisons across data types in different groups In HIVE, is it possible to perform comparisons between two columns that are in different data type groups inline within the SELECT clause? I need to first determine what the incoming meta data is for each column and then provide logic that picks what CAST to use. CASE WHEN Column1 <=> Column2 THEN 0 -- Error occurs here if data types are in

Apache Spark 2.3.1 with Hive metastore 3.1.0

阅读更多关于 Apache Spark 2.3.1 with Hive metastore 3.1.0

问题 We have upgraded HDP cluster to 3.1.1.3.0.1.0-187 and have discovered: Hive has a new metastore location Spark can't see Hive databases In fact we see: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found Could you help me understanding what has happened and how to solve this? Update: Configuration: (spark.sql.warehouse.dir,/warehouse/tablespace/external/hive/) (spark.admin.acls,) (spark.yarn.dist.files,file:///opt/folder/config.yml,file:///opt/jdk1.8.0_172

Issue with AWS Glue Data Catalog as Metastore for Spark SQL on EMR

阅读更多关于 Issue with AWS Glue Data Catalog as Metastore for Spark SQL on EMR

问题 I am having an AWS EMR cluster (v5.11.1) with Spark(v2.2.1) and trying to use AWS Glue Data Catalog as its metastore. As per guidelines provided in official AWS documentation (reference link below), I have followed the steps but I am facing some discrepancy with regards to accessing the Glue Catalog DB/Tables. Both EMR Cluster & AWS Glue are in the same account and appropriate IAM permissions have been provided. AWS Documentation : https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark