hive-partitions

Can i move data from one hive partition to another partition of the same table

◇◆丶佛笑我妖孽 提交于 2020-01-16 18:36:05
问题 My partition is based on year/month/date. Using SimpleDateFormat for week year created a wrong partition . The data for the date 2017-31-12 was moved to 2018-31-12 using YYYY in the date format. SimpleDateFormat sdf = new SimpleDateFormat("YYYY-MM-dd"); So what I want is to move my data from partition 2018/12/31 to 2017/12/31 of the same table. I did not find any relevant documentation to do the same. 回答1: From what I understood, you would like to move the data from 2018-12-31 partition to

How to truncate a partitioned external table in hive?

随声附和 提交于 2020-01-01 06:44:08
问题 I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table : hive> truncate table abc; But, it is throwing me an error stating : Cannot truncate non-managed table abc. Can anyone please suggest me out regarding the same ... 回答1: Make your table MANAGED first: ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE'); Then truncate: truncate table abc; And finally you can make it external again: ALTER TABLE abc SET

hive setting hive.optimize.sort.dynamic.partition

Deadly 提交于 2019-12-25 03:36:12
问题 I am trying to insert into a hive table with dynamic partitions. The same query has been running fine for last few days, but is giving the below error now. Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from

How to check whether a partition exists with hive

拜拜、爱过 提交于 2019-12-24 09:05:57
问题 I have a HiveQL script that can do some operations based on a hive table. But before doing these operations, I will check whether the partition needed exists, and if not, I will terminate the script. So how can I achieve it? 回答1: Using shell: table_name="schema.table" partition_spec="key=value" partition_exists=$(hive -e "show partitions $table_name" | grep "$partition_spec"); #check partition_exists if [ "$partition_exists" = "" ]; then echo not exists; else echo exists; fi 来源: https:/

Insert data in many partitions using one insert statement

我是研究僧i 提交于 2019-12-24 07:35:22
问题 I have table A and table B, where B is the partitioned table of A using a field called X. When I want to insert data from A to B, I usually execute the following statement: INSERT INTO TABLE B PARTITION(X=x) SELECT <columnsFromA> FROM A WHERE X=x Now what I want to achieve is being able to insert a range of X, let's say x1, x2, x3... How can I achieve this in one single statement? 回答1: Use dynamic partition load: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode

Hive: Add partitions for existing folder structure

夙愿已清 提交于 2019-12-20 05:32:10
问题 I have a folder structure in HDFS like below. However, no partitions were actually created on the table using the ALTER TABLE ADD PARTITION commands, even though the folder structure was setup as if the table had partitions. How can I automatically add all the partitions to the Hive table? (Hive 1.0, external table) /user/frank/clicks.db /date=20190401 /file0004.csv /date=20190402 /file0009.csv /date=20190501 /file0000.csv /file0001.csv ...etc 回答1: Use msck repair table command: MSCK [REPAIR]

pyspark - getting Latest partition from Hive partitioned column logic

房东的猫 提交于 2019-12-18 09:15:06
问题 I am new to pySpark. I am trying get the latest partition (date partition) of a hive table using PySpark-dataframes and done like below. But I am sure there is a better way to do it using dataframe functions (not by writing SQL). Could you please share inputs on better ways. This solution is scanning through entire data on Hive table to get it. df_1 = sqlContext.table("dbname.tablename"); df_1_dates = df_1.select('partitioned_date_column').distinct().orderBy(df_1['partitioned_date_column']

Hive Runtime Error: Unable to deserialize reduce input key

五迷三道 提交于 2019-12-13 05:55:29
问题 I am trying to run a Insert in to partition table with group by involved query 'set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.execution.engine=tez; INSERT OVERWRITE TABLE table1 PARTITION (date) select col1,CONCAT(COALESCE(substr(Cdate,1,4),'-'),'',COALESCE(substr(Cdate,6,2),'-'),'',COALESCE(substr(Cdate,9,2),'-')),col3,col4,'mobile-data',data,date from (select col1,substr(CDate,1,10) as Cdate,u.col3 as col3,u.col4 as col4,date,sum(u.col5+u

How to insert/copy one partition's data to multiple partitions in hive?

核能气质少年 提交于 2019-12-10 10:19:52
问题 I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02' , '2019-01-03' ... '2019-01-31' ) I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'. INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03') SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01'; 回答1: Cross join all your data with calendar dates for required date range. Use dynamic partitioning:

Does DROP PARTITION delete data from external table in HIVE?

蓝咒 提交于 2019-12-06 07:57:01
问题 An external table in HIVE is partitioned on year, month and day. So does the following query delete data from external table for the specific partitioned referenced in this query?:- ALTER TABLE MyTable DROP IF EXISTS PARTITION(year=2016,month=7,day=11); 回答1: Partitioning scheme is not data. Partitioning scheme is part of table DDL stored in metadata (simply saying: partition key value + location where the data-files are being stored). Data itself are stored in files in the partition location