问题
I have a main table (Employee) which is having 10 columns and I can load data into it using load data inpath /file1.txt into table Employee
My question is how to handle the same table (Employee) if my file file2.txt has same columns but column 3 and columns 5 are missing. if I directly load data last columns will be NULL
NULL
. but instead it should load 3rd as NULL and 5th column as NULL.
Suppose I have a table Employee and I want to load the file1.txt
and file2.txt
to table.
file1.txt ========== id name sal deptid state coutry 1 aaa 1000 01 TS india 2 bbb 2000 02 AP india 3 ccc 3000 03 BGL india file2.txt id name deptid country 1 second 001 US 2 third 002 ENG 3 forth 003 AUS
In file2.txt
we are missing 2 columns i.e. sal
and state
.
we need to use the same Employee table how to handle it ?
回答1:
I'm not aware of any way to create a table backed by data files with a non-homogenous structure. What you can do however, is to define separate tables for the different column configurations and then define a view that queries both.
I think it's easier if I provide an example. I will use two tables of people, both have a column for name, but one stores height as well, while the other stores weight instead:
> create table table1(name string, height int);
> insert into table1 values ('Alice', 178), ('Charlie', 185);
> create table table2(name string, weight int);
> insert into table2 values ('Bob', 98), ('Denise', 52);
> create view people as
> select name, height, NULL as weight from table1
> union all
> select name, NULL as height, weight from table2;
> select * from people order by name;
+---------+--------+--------+
| name | height | weight |
+---------+--------+--------+
| Alice | 178 | NULL |
| Bob | NULL | 98 |
| Charlie | 185 | NULL |
| Denise | NULL | 52 |
+---------+--------+--------+
Or as a closer example to your problem, let's say that one table has name, height and weight, while the other only has name and weight, thereby height is "missing from the middle":
> create table table1(name string, height int, weight int);
> insert into table1 values ('Alice', 178, 55), ('Charlie', 185, 78);
> create table table2(name string, weight int);
> insert into table2 values ('Bob', 98), ('Denise', 52);
> create view people as
> select name, height, weight from table1
> union all
> select name, NULL as height, weight from table2;
> select * from people order by name;
+---------+--------+--------+
| name | height | weight |
+---------+--------+--------+
| Alice | 178 | 55 |
| Bob | NULL | 98 |
| Charlie | 185 | 78 |
| Denise | NULL | 52 |
+---------+--------+--------+
Be sure to use union all
and not just union
, because the latter tries to remove duplicate rows, which makes it very expensive.
回答2:
It seems like there is no way to directly load into specified columns.
As such, this is what you probably need to do:
- Load data inpath to a (temporary?) table that matches the file
- Insert into relevant columns of final table by selecting the contents of the previous table.
The situation is very similar to this question which covers the opposite scenario (you only want to load a few columns).
来源:https://stackoverflow.com/questions/39580335/how-to-load-data-to-same-hive-table-if-file-has-different-number-of-columns