How to filter column on values in list in pyspark?

倾然丶 夕夏残阳落幕 提交于 2019-12-12 10:43:09

问题


I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:

df = dfRawData.filter(col("X").between("CB","CI","CR"))

But I am getting the following error:

between() takes exactly 3 arguments (4 given)

Please let me know how I can resolve this issue.


回答1:


between is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin:

df = dfRawData.where(col("X").isin({"CB", "CI", "CR"}))


来源:https://stackoverflow.com/questions/46707339/how-to-filter-column-on-values-in-list-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!