apache-hive

Adding Local Files in Beeline (Hive)

China☆狼群 提交于 2021-02-10 17:42:56
问题 I'm trying to add local files via the Beeline client, however I keep running into an issue where it tells me the file does not exist. [test@test-001 tmp]$ touch /tmp/m.py [test@test-001 tmp]$ stat /tmp/m.py File: ‘/tmp/m.py’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 801h/2049d Inode: 34091464 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 1036/ test) Gid: ( 1037/ test) Context: unconfined_u:object_r:user_tmp_t:s0 Access: 2017-02-27 22:04:06.527970709 +0000 Modify: 2017-02-27 22

Apache Spark's deployment issue (cluster-mode) with Hive

被刻印的时光 ゝ 提交于 2019-12-25 04:45:29
问题 EDIT : I'm developing a Spark application that reads a data from the multiple structured schemas and I'm trying to aggregate the information from those schemas. My application runs well when I run it locally. But when I run it on a cluster, I'm having trouble with the configurations (most probably with hive-site.xml) or with the submit-command arguments. I've looked for the other related posts, but couldn't find the solution SPECIFIC to my scenario. I've mentioned what commands I tried and

How do I access HBase table in Hive & vice-versa?

心已入冬 提交于 2019-12-12 10:37:47
问题 As a developer, I've created HBase table for our project by importing data from existing MySQL table using sqoop job . The problem is our data analyst team are familiar with MySQL syntax, implies they can query HIVE table easily. For them, I need to expose HBase table in HIVE. I don't want to duplicate data by populating data again in HIVE. Also, duplicating data might have consistency issues in future. Can I expose HBase table in HIVE without duplicating data ? If yes, how do I do it? Also,

Hive JDBC error: java.lang.NoSuchFieldError: HIVE_CLI_SERVICE_PROTOCOL_V7

梦想与她 提交于 2019-12-12 04:34:25
问题 I'm trying to create a connection via JDBC to Impala using the Hive2 connector. But I'm getting this error: Exception in thread "main" java.lang.NoSuchFieldError: HIVE_CLI_SERVICE_PROTOCOL_V7 at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:175) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at dsnoc.dsnoc_api.dolar

Apache hive MSCK REPAIR TABLE new partition not added

送分小仙女□ 提交于 2019-12-07 02:04:01
问题 I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried, -- creating external table hive> create external table factory(name string, empid int, age int) partitioned by(region string) > row format delimited fields terminated by ','; --Detailed Table Information Location: hdfs://localhost.localdomain:8020/user/hive/warehouse/factory Table Type:

Apache hive MSCK REPAIR TABLE new partition not added

大兔子大兔子 提交于 2019-12-05 08:09:01
I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried, -- creating external table hive> create external table factory(name string, empid int, age int) partitioned by(region string) > row format delimited fields terminated by ','; --Detailed Table Information Location: hdfs://localhost.localdomain:8020/user/hive/warehouse/factory Table Type: EXTERNAL_TABLE Table Parameters: EXTERNAL TRUE transient_lastDdlTime 1438579844 -- creating directory in HDFS

Spark SQL on ORC files doesn't return correct Schema (Column names)

我的未来我决定 提交于 2019-11-29 15:27:31
I have a directory containing ORC files. I am creating a DataFrame using the below code var data = sqlContext.sql("SELECT * FROM orc.`/directory/containing/orc/files`"); It returns data frame with this schema [_col0: int, _col1: bigint] Where as the expected schema is [scan_nbr: int, visit_nbr: bigint] When I query on files in parquet format I get correct schema. Am I missing any configuration(s)? Adding more details This is Hortonworks Distribution HDP 2.4.2 (Spark 1.6.1, Hadoop 2.7.1, Hive 1.2.1) We haven't changed the default configurations of HDP, but this is definitely not the same as the

Spark SQL on ORC files doesn't return correct Schema (Column names)

ぐ巨炮叔叔 提交于 2019-11-28 09:14:53
问题 I have a directory containing ORC files. I am creating a DataFrame using the below code var data = sqlContext.sql("SELECT * FROM orc.`/directory/containing/orc/files`"); It returns data frame with this schema [_col0: int, _col1: bigint] Where as the expected schema is [scan_nbr: int, visit_nbr: bigint] When I query on files in parquet format I get correct schema. Am I missing any configuration(s)? Adding more details This is Hortonworks Distribution HDP 2.4.2 (Spark 1.6.1, Hadoop 2.7.1, Hive