hive-partitions

get latest data from hive table with multiple partition columns

风流意气都作罢 提交于 2021-02-19 05:36:06
问题 I have a hive table with below structure ID string, Value string, year int, month int, day int, hour int, minute int This table is refreshed every 15 mins and it is partitioned with year/month/day/hour/minute columns. Please find below samples on partitions. year=2019/month=12/day=29/hour=19/minute=15 year=2019/month=12/day=30/hour=00/minute=45 year=2019/month=12/day=30/hour=08/minute=45 year=2019/month=12/day=30/hour=09/minute=30 year=2019/month=12/day=30/hour=09/minute=45 I want to select

Reg : Efficiency among query optimizers in hive

做~自己de王妃 提交于 2021-02-18 18:13:30
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

早过忘川 提交于 2021-02-18 18:13:25
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

♀尐吖头ヾ 提交于 2021-02-18 18:12:30
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

Reg : Efficiency among query optimizers in hive

大憨熊 提交于 2021-02-18 18:11:08
问题 After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still confused how indexes actually work. Where is the metadata for index is stored? Is it the namenode which is storing it? I.e., actually while creating partitions or buckets we can see multiple directories in hdfs which explains the query performance

How to undo ALTER TABLE … ADD PARTITION without deleting data

时光怂恿深爱的人放手 提交于 2021-01-28 07:07:30
问题 Let's suppose I have two hive tables, table_1 and table_2 . I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition where col = val . What I want to do is reverse this process. I want table_2 not to have the partition at col=val , and I want table_1 to keep its original data. How can I do this? 回答1: Make your table EXTERNAL first: ALTER TABLE table_2 SET TBLPROPERTIES('EXTERNAL'='TRUE'); Then drop partition,

Issue in Hive Query due to memory

主宰稳场 提交于 2021-01-28 07:01:38
问题 We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table. Query - insert into db1.fact_table PARTITION(part_col1, part_col2) ( col1, col2, col3, col4, col5, col6, . . . . . . . col32 LOAD_DT, part_col1, Part_col2 ) select col1, col2, col3, col4, col5, col6, . . . . . . . col32, part_col1, Part_col2 from db1.main_table WHERE col1=0; Table has 34 columns, number of records in main table depends on size of input file which we

Hive external table optimal partition size

情到浓时终转凉″ 提交于 2020-12-26 03:22:50
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

Hive external table optimal partition size

纵饮孤独 提交于 2020-12-26 03:20:34
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

How does hive handle insert into internal partition table?

喜欢而已 提交于 2020-12-03 08:01:11
问题 I have a requirement to insert streaming of records into Hive partitioned table. The table structure is something like CREATE TABLE store_transation ( item_name string, item_count int, bill_number int, ) PARTITIONED BY ( yyyy_mm_dd string ); I would like to understand how Hive handles inserts in the internal table. Does all record insert into a single file inside the yyyy_mm_dd=2018_08_31 directory? Or hive splits into multiple files inside a partition, if so when? Which one performs well