问题
I am trying to do Full Outer Join on 4 Hive tables. The JOIN key is same, but the schema of the 4 tables are different. I want to generate all the column values for all the ids present in the 4 tables. But the id column should be present only once with all values included, not 4 times(each from one table)
Query 1
select count(*)
from table1 f FULL OUTER JOIN table2 u on f.id=u.id
FULL OUTER JOIN table3 v on f.id=v.id
FULL OUTER JOIN table4 v_in on f.id=v_in.id;
Count=2787037
Query 2
select count(*)
from table1 f FULL OUTER JOIN table2 u on f.id=u.id
FULL OUTER JOIN table3 v on f.id=v.id
FULL OUTER JOIN table4 v_in on f.id=v_in.id
group by f.id,u.id,v.id,v_in.id, f.name, f.amt, f.add, u.dt, u.ts, v.ea,v.rd,
v_in.c1,v_in.c2,v_in.c3,v_in.c4,v_in.c5;
Count=2787037
How to generate all the values of id from 4 tables in one column, along with other column values?
Is there is a better way to do this?
回答1:
You should just select the columns you want. I think you want coalesce()
:
select coalesce(f.id, u.id, v.id, v_in.id) as id,
f.name, f.amt, f.add, u.dt, u.ts, v.ea, v.rd,
v_in.c1, v_in.c2, v_in.c3, v_in.c4, v_in.c5
from . . .;
With full outer join
you need lots of coalesce()
s:
select . . .
from table1 f full join
table2 u
on f.id = u.id full join
table3 v
on v.id in (f.id, u.id) full join
table4 v_in
on v_in.id in (f.id, u.id, v.id);
来源:https://stackoverflow.com/questions/55190125/hive-full-outer-join-with-4-tables-on-same-key-different-schema