I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.
Here is a
If you want results from table b, perhaps you can do the following instead?
SELECT b.key FROM b LEFT OUTER JOIN a ON b.key = a.key WHERE a.key IS NULL;
The answer to your issue should be
SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;
This means, bring all the keys from a, irrespective of whether there is a match in b or not. The where cause will filter those records, which are not available in b.
Or you can try
SELECT a.key FROM a LEFT ANTI JOIN b ON a.key = b.key
I tried left semi join for IN function in cdh 5.7.0 with spark 1.6 version.
The semi left join gives wrong results, which is not similar to IN function in sub queries.