hive-partitions | 易学教程

get latest data from hive table with multiple partition columns

阅读更多关于 get latest data from hive table with multiple partition columns

问题 I have a hive table with below structure ID string, Value string, year int, month int, day int, hour int, minute int This table is refreshed every 15 mins and it is partitioned with year/month/day/hour/minute columns. Please find below samples on partitions. year=2019/month=12/day=29/hour=19/minute=15 year=2019/month=12/day=30/hour=00/minute=45 year=2019/month=12/day=30/hour=08/minute=45 year=2019/month=12/day=30/hour=09/minute=30 year=2019/month=12/day=30/hour=09/minute=45 I want to select

Reg : Efficiency among query optimizers in hive

阅读更多关于 Reg : Efficiency among query optimizers in hive

问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

阅读更多关于 Reg : Efficiency among query optimizers in hive

Reg : Efficiency among query optimizers in hive

阅读更多关于 Reg : Efficiency among query optimizers in hive

Reg : Efficiency among query optimizers in hive

阅读更多关于 Reg : Efficiency among query optimizers in hive

How to undo ALTER TABLE … ADD PARTITION without deleting data

阅读更多关于 How to undo ALTER TABLE … ADD PARTITION without deleting data

问题 Let's suppose I have two hive tables, table_1 and table_2 . I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition where col = val . What I want to do is reverse this process. I want table_2 not to have the partition at col=val , and I want table_1 to keep its original data. How can I do this? 回答1: Make your table EXTERNAL first: ALTER TABLE table_2 SET TBLPROPERTIES('EXTERNAL'='TRUE'); Then drop partition,

Issue in Hive Query due to memory

阅读更多关于 Issue in Hive Query due to memory

问题 We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table. Query - insert into db1.fact_table PARTITION(part_col1, part_col2) ( col1, col2, col3, col4, col5, col6, . . . . . . . col32 LOAD_DT, part_col1, Part_col2 ) select col1, col2, col3, col4, col5, col6, . . . . . . . col32, part_col1, Part_col2 from db1.main_table WHERE col1=0; Table has 34 columns, number of records in main table depends on size of input file which we

Hive external table optimal partition size

阅读更多关于 Hive external table optimal partition size

问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

Hive external table optimal partition size

阅读更多关于 Hive external table optimal partition size

How does hive handle insert into internal partition table?

阅读更多关于 How does hive handle insert into internal partition table?

问题 I have a requirement to insert streaming of records into Hive partitioned table. The table structure is something like CREATE TABLE store_transation ( item_name string, item_count int, bill_number int, ) PARTITIONED BY ( yyyy_mm_dd string ); I would like to understand how Hive handles inserts in the internal table. Does all record insert into a single file inside the yyyy_mm_dd=2018_08_31 directory? Or hive splits into multiple files inside a partition, if so when? Which one performs well