问题
Suppose I have 2 tables as shown below. Now, if I want to achieve result which sql will give using, insert into B where id not in(select id from A)
which will insert 3 George
in Table B.
How to implement this in hive?
Table A
id name
1 Rahul
2 Keshav
3 George
Table B
id name
1 Rahul
2 Keshav
4 Yogesh
回答1:
NOT IN in the WHERE clause with uncorrelated subqueries is supported since Hive 0.13 which was released more than 3 years ago, on 21 April, 2014.
select * from A where id not in (select id from B where id is not null);
+----+--------+
| id | name |
+----+--------+
| 3 | George |
+----+--------+
On earlier versions the column of the outer table should be qualified with the table name/alias.
hive> select * from A where id not in (select id from B where id is not null);
FAILED: SemanticException [Error 10249]: Line 1:22 Unsupported SubQuery Expression 'id': Correlating expression cannot contain unqualified column references.
hive> select * from A where A.id not in (select id from B where id is not null);
OK
3 George
P.s.
When using NOT IN you should add is not null
to the inner query, unless you are 100% sure that the relevant column does not contain null values.
One null value is enough to cause your query to return no results.
来源:https://stackoverflow.com/questions/44714625/how-to-use-not-in-in-hive