hive-partitions

How to insert/copy one partition's data to multiple partitions in hive?

放肆的年华 提交于 2019-12-06 03:46:53
I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02' , '2019-01-03' ... '2019-01-31' ) I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'. INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03') SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01'; Cross join all your data with calendar dates for required date range. Use dynamic partitioning: set hivevar:start_date=2019-01-02; set hivevar:end_date=2019-01-31; set hive.exec.dynamic.partition=true; set

How to truncate a partitioned external table in hive?

孤街浪徒 提交于 2019-12-03 17:18:43
I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table : hive> truncate table abc; But, it is throwing me an error stating : Cannot truncate non-managed table abc. Can anyone please suggest me out regarding the same ... Make your table MANAGED first: ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE'); Then truncate : truncate table abc; And finally you can make it external again: ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE'); 来源: https://stackoverflow.com/questions/53257144/how-to-truncate-a-partitioned

Hive: Add partitions for existing folder structure

自作多情 提交于 2019-12-02 05:31:12
I have a folder structure in HDFS like below. However, no partitions were actually created on the table using the ALTER TABLE ADD PARTITION commands, even though the folder structure was setup as if the table had partitions. How can I automatically add all the partitions to the Hive table? (Hive 1.0, external table) /user/frank/clicks.db /date=20190401 /file0004.csv /date=20190402 /file0009.csv /date=20190501 /file0000.csv /file0001.csv ...etc Use msck repair table command: MSCK [REPAIR] TABLE tablename; or ALTER TABLE tablename RECOVER PARTITIONS; if you are running Hive on EMR. Read more

pyspark - getting Latest partition from Hive partitioned column logic

ε祈祈猫儿з 提交于 2019-11-29 15:37:07
I am new to pySpark. I am trying get the latest partition (date partition) of a hive table using PySpark-dataframes and done like below. But I am sure there is a better way to do it using dataframe functions (not by writing SQL). Could you please share inputs on better ways. This solution is scanning through entire data on Hive table to get it. df_1 = sqlContext.table("dbname.tablename"); df_1_dates = df_1.select('partitioned_date_column').distinct().orderBy(df_1['partitioned_date_column'].desc()) lat_date_dict=df_1_dates.first().asDict() lat_dt=lat_date_dict['partitioned_date_column'] I agree

Dynamic partition cannot be the parent of a static partition '3'

独自空忆成欢 提交于 2019-11-28 01:52:52
问题 While inserting data into table hive threw the error "Dynamic partition cannot be the parent of a static partition '3'" using below query INSERT INTO TABLE student_partition PARTITION(course , year = 3) SELECT name, id, course FROM student1 WHERE year = 3; Please explain the reason.. 回答1: The reason of this Exception is because partitions are hierarchical folders. course folder is upper level and year is nested folders for each year. When you creating partitions dynamically, upper folder