CTAS with Dynamic Partition

给你一囗甜甜゛ 提交于 2019-12-12 18:17:07

问题


I want to change an existing table, that contains text format, into orc format. I was able to do it by: (1) creating a table in orc format manually having the partitions and then, (2) using the INSERT OVERWRITE statement to populate the table.

I am trying to use CTAS (Create Table... AS Select...) statement for this. Is there any way I can include the dynamic partitioning with CTAS statement? So, if my text data set has multiple partitions (for example: year and month), can I point this in CTAS statement directly?

The format might be something like this:

CREATE TABLE TEST_TABLE
STORED AS ORC
WITH PARTITION(year, month)
LOCATION '/<my_location>'
tblproperties ("orc.compress"="SNAPPY")
AS SELECT * FROM <existing_table>;

Any idea please?

Note: The reason I'm interested for CTAS statement is as: using this statement, I don't really need to point out each and every columns name. But if I create an orc based table and then populate it manually, I had to indicate all the columns while creating the table. This is okay; but, not a good idea if my existing table contains a lot of columns.


回答1:


Not supported

hive> create table t partitioned by (p int) as select 1 as i;

FAILED: SemanticException [Error 10068]:
CREATE-TABLE-AS-SELECT does not support partitioning in the target table




回答2:


Two Steps:

CREATE TABLE target_table_name LIKE source_table_name;

INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name)
SELECT * FROM source_table_name;

Rest stored as ORC, location you can add as per convenience and support by hive query.

For external tables, the steps do not work. The issue comes is, data is updated in pointed directory on HDFS but we cant see changes reflected in table when we do select table. For this you may follow following steps:

Solution 1: Check tables metadata if its updated or not.

Solution 2: If solution 1 does not work, then:

CREATE TABLE target_table_name LIKE source_table_name;

INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name) 
  SELECT * FROM source_table_name;

CREATE EXTERNAL TABLE another_table_name LIKE source_table_name 
  STORED AS file_format_of_source_table 
  LOCATION 'location_of_source_table';

Fix the table

 MSCK REPAIR TABLE another_table;

Then you can drop source_table and rename another_table to source_table_name.




回答3:


The other answers are both correct: you can do the CREATE TABLE dest LIKE src; if you have a table that is exactly what you want, but you cannot currently do a truly dynamic CTAS to create a partitioned table from variable fields.

This feature has been added to the upcoming Spark 3.0: https://issues.apache.org/jira/browse/SPARK-26435



来源:https://stackoverflow.com/questions/43241648/ctas-with-dynamic-partition

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!