How can I use In clause in Hive I want to write something like this in Hive select x from y where y.z in (select distinct z from y) order by x; But I am not finding any way o
According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select:
"Hive does not support IN, EXISTS or subqueries in the WHERE clause."
You might want to look at: https://issues.apache.org/jira/browse/HIVE-801
https://issues.apache.org/jira/browse/HIVE-1799
Hive 0.13 now do support IN/EXISTS in the WHERE-clause .. The issue https://issues.apache.org/jira/browse/HIVE-784 has been resolved after 4 years :)
Hive supports perfectly the IN ... it does not support the subquery in the WHERE clause
there is an open feature ticket from Facebook engineers since 4 years... https://issues.apache.org/jira/browse/HIVE-784?focusedCommentId=13579059
assume table t1(id,name)
and table t2(id,name)
listing only those ids from t1
that exists in t2(basically IN
clause)
hive>select a.id from t1 a left semi join t2 b on (a.id=b.id);
listing only those ids from t1
that exists only in t1
but not in t2(basically NOT IN
clause)
hive>select a.id from t1 a left outer join t2 b on(a.id=b.id) where b.id is null;
Hive does support IN/EXISTS statements since Hive 0.13 with few limitations. Please refer to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries for more details.
I'm using hive version 0.7.1
and SELECT * FROM MYTABLE WHERE MYCOLUMN IN ('thisThing','thatThing');
I tested this on a column type STRING
so I am not sure if this works universally on all data types since I noticed like Wawrzyniec mentioned above that the Hive Language Manual says that it is not supported and to instead use LEFT SEMI JOIN
but it worked fine in my test.