Suppose I have 2 tables as shown below. Now, if I want to achieve result which sql will give using, insert into B where id not in(select id from A)
which will inse
NOT IN in the WHERE clause with uncorrelated subqueries is supported since Hive 0.13 which was released more than 3 years ago, on 21 April, 2014.
select * from A where id not in (select id from B where id is not null);
+----+--------+
| id | name |
+----+--------+
| 3 | George |
+----+--------+
On earlier versions the column of the outer table should be qualified with the table name/alias.
hive> select * from A where id not in (select id from B where id is not null);
FAILED: SemanticException [Error 10249]: Line 1:22 Unsupported SubQuery Expression 'id': Correlating expression cannot contain unqualified column references.
hive> select * from A where A.id not in (select id from B where id is not null);
OK
3 George
P.s.
When using NOT IN you should add is not null
to the inner query, unless you are 100% sure that the relevant column does not contain null values.
One null value is enough to cause your query to return no results.