问题
I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:
df = dfRawData.filter(col("X").between("CB","CI","CR"))
But I am getting the following error:
between() takes exactly 3 arguments (4 given)
Please let me know how I can resolve this issue.
回答1:
between
is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin
:
df = dfRawData.where(col("X").isin({"CB", "CI", "CR"}))
来源:https://stackoverflow.com/questions/46707339/how-to-filter-column-on-values-in-list-in-pyspark