问题
I want to change an existing table, that contains text format, into orc format. I was able to do it by: (1) creating a table in orc format manually having the partitions and then, (2) using the INSERT OVERWRITE statement to populate the table.
I am trying to use CTAS (Create Table... AS Select...) statement for this. Is there any way I can include the dynamic partitioning with CTAS statement? So, if my text data set has multiple partitions (for example: year and month), can I point this in CTAS statement directly?
The format might be something like this:
CREATE TABLE TEST_TABLE
STORED AS ORC
WITH PARTITION(year, month)
LOCATION '/<my_location>'
tblproperties ("orc.compress"="SNAPPY")
AS SELECT * FROM <existing_table>;
Any idea please?
Note: The reason I'm interested for CTAS statement is as: using this statement, I don't really need to point out each and every columns name. But if I create an orc based table and then populate it manually, I had to indicate all the columns while creating the table. This is okay; but, not a good idea if my existing table contains a lot of columns.
回答1:
Not supported
hive> create table t partitioned by (p int) as select 1 as i;
FAILED: SemanticException [Error 10068]:
CREATE-TABLE-AS-SELECT does not support partitioning in the target table
回答2:
Two Steps:
CREATE TABLE target_table_name LIKE source_table_name;
INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name)
SELECT * FROM source_table_name;
Rest stored as ORC, location you can add as per convenience and support by hive query.
For external tables, the steps do not work. The issue comes is, data is updated in pointed directory on HDFS but we cant see changes reflected in table when we do select table. For this you may follow following steps:
Solution 1: Check tables metadata if its updated or not.
Solution 2: If solution 1 does not work, then:
CREATE TABLE target_table_name LIKE source_table_name;
INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name)
SELECT * FROM source_table_name;
CREATE EXTERNAL TABLE another_table_name LIKE source_table_name
STORED AS file_format_of_source_table
LOCATION 'location_of_source_table';
Fix the table
MSCK REPAIR TABLE another_table;
Then you can drop source_table and rename another_table
to source_table_name
.
回答3:
The other answers are both correct: you can do the CREATE TABLE dest LIKE src;
if you have a table that is exactly what you want, but you cannot currently do a truly dynamic CTAS to create a partitioned table from variable fields.
This feature has been added to the upcoming Spark 3.0: https://issues.apache.org/jira/browse/SPARK-26435
来源:https://stackoverflow.com/questions/43241648/ctas-with-dynamic-partition