问题
I have a data frame with two columns as shown below,
DT_EX = dt.Frame({'film':['Don','Warriors','Dragon','Chicago','Lion','Don','Chicago','Warriors'],
'gross':[400,500,600,100,200,300,900,1000]})
Here in first case i would like to filter the observations whose film is Don or Chicago as written in below code,
DT_EX[((f.film=="Don") | (f.film=="Chicago")),:]
In a second i would apply filter for 3 values as,
DT_EX[((f.film=="Don") | (f.film=="Chicago") | (f.film=="Lion")),:]
In case of filtering for more than 5 or 10 values, we are supposed to make a logical expression for these many values,and it would definatly be a time consuming task.
Is there any datatable way to get it done faster? like there are %in%
%chin%
kind of filtering options available in R data.table
.
回答1:
Python equivalent of R's %in
operator is called simply in
. Unfortunately, this operator hasn't been implemented in datatable yet, the relevant feature request is https://github.com/h2oai/datatable/issues/699.
In the meanwhile, I'd recommend to use the standard reduce
functor with or_
operator:
>>> import functools
>>> import operator
>>>
>>> films = ['Lion', 'Chicago', 'Don']
>>> filter = functools.reduce(operator.or_, (f.film == item for item in films))
>>> DT_EX[filter, :]
| film gross
-- + ------- -----
0 | Don 400
1 | Chicago 100
2 | Lion 200
3 | Don 300
4 | Chicago 900
[5 rows x 2 columns]
来源:https://stackoverflow.com/questions/61494957/how-to-filter-observations-for-the-multiple-values-passed-in-the-i-expression-of