How to filter observations for the multiple values passed in the I expression of Pydatatable frame?

五迷三道 提交于 2020-07-22 18:46:10

问题


I have a data frame with two columns as shown below,

DT_EX = dt.Frame({'film':['Don','Warriors','Dragon','Chicago','Lion','Don','Chicago','Warriors'],
                  'gross':[400,500,600,100,200,300,900,1000]})

Here in first case i would like to filter the observations whose film is Don or Chicago as written in below code,

DT_EX[((f.film=="Don") | (f.film=="Chicago")),:]

In a second i would apply filter for 3 values as,

DT_EX[((f.film=="Don") | (f.film=="Chicago") | (f.film=="Lion")),:]

In case of filtering for more than 5 or 10 values, we are supposed to make a logical expression for these many values,and it would definatly be a time consuming task.

Is there any datatable way to get it done faster? like there are %in% %chin% kind of filtering options available in R data.table.


回答1:


Python equivalent of R's %in operator is called simply in. Unfortunately, this operator hasn't been implemented in datatable yet, the relevant feature request is https://github.com/h2oai/datatable/issues/699.

In the meanwhile, I'd recommend to use the standard reduce functor with or_ operator:

>>> import functools
>>> import operator
>>>
>>> films = ['Lion', 'Chicago', 'Don']
>>> filter = functools.reduce(operator.or_, (f.film == item for item in films))
>>> DT_EX[filter, :]
   | film     gross
-- + -------  -----
 0 | Don        400
 1 | Chicago    100
 2 | Lion       200
 3 | Don        300
 4 | Chicago    900

[5 rows x 2 columns]


来源:https://stackoverflow.com/questions/61494957/how-to-filter-observations-for-the-multiple-values-passed-in-the-i-expression-of

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!