问题
I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.
create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';
create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';
Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table
. The new external table is as follows.
create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';
how can I do it.?
回答1:
You do SELECT
from each table and perform UNION ALL
operation on these results and finally insert the result into your third table.
Below is the final hive query:
INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;
Hope this help you!!!
回答2:
You can very well do it by manual partitioning
as well.
CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)
Then run below command,
load data inpath 'hdfspath' into table transaction_new_table partition(sourcetablename='1')
回答3:
You could use the INSERT INTO Clause of Hive
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_usa' FROM transaction_usa;
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_canada' FROM transaction_canada;
来源:https://stackoverflow.com/questions/37299136/insert-data-of-2-hive-external-tables-in-new-external-table-with-additional-colu