问题
I am using spark sql to join three tables, however i get error with multiple column conditions.
test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
.join(T3, T3.kids_dtm == T1.dtm
and T2.room_id == T3.room_id
and T2.book_id == T3.book_id, "inner"))
ERROR:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark/python/pyspark/sql/column.py", line 447, in __nonzero__
raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Instead of specifying "and", i have tried putting "&" and "&&" , but none of these work. Any help would be appreciated.
回答1:
Nvm, following works with use of "&" and brackets:
test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
.join(T3, (T3.kids_dtm == T1.dtm)
& (T2.room_id == T3.room_id)
& (T2.book_id == T3.book_id), "inner"))
来源:https://stackoverflow.com/questions/37448081/spark-multiple-conditions-join