Filter multiple conditions dplyr

匿名 (未验证) 提交于 2019-12-03 02:50:02

问题:

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

Example: data.frame name = dat

days      name 88        Lynn 11          Tom 2           Chris 5           Lisa 22        Kyla 1          Tom 222      Lynn 2         Lynn 

I'd like to filter out Tom and Lynn for example.
When I do:

target 

I get this error:

longer object length is not a multiple of shorter object length 

回答1:

You need %in% instead of ==:

library(dplyr) target % filter(name %in% target) 

Produces

  days name 1   88 Lynn 2   11  Tom 3    1  Tom 4  222 Lynn 5    2 Lynn 

To understand why, consider what happens here:

dat$name == target # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom   Tom == Lynn Chris == Tom  Lisa == Lynn  ... continue repeating Tom and Lynn until end of data frame 

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:

return TRUE for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

To contrast, dat$name %in% target says:

for each value in dat$name, check that it exists in target.

Very different. Here is the result:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE 

Note your problem has nothing to do with dplyr, just the mis-use of ==.



回答2:

Using the base package:

df 

Output:

  days name 1   88 Lynn 2   11  Tom 6    1  Tom 7  222 Lynn 8    2 Lynn 

Using sqldf:

library(sqldf) # Two alternatives: sqldf('SELECT *       FROM df        WHERE name = "Tom" OR name = "Lynn"') sqldf('SELECT *       FROM df        WHERE name IN ("Tom", "Lynn")') 


回答3:

This can be achieved using dplyr package, which is available in CRAN. The simple way to achieve this:

  1. Install dplyr package.

  2. library(dplyr) df

Explanation:

So, once we’ve downloaded dplyr, we create a new data frame by using two different functions from this package:

filter: the first argument is the data frame; the second argument is the condition by which we want it subsetted. The result is the entire data frame with only the rows we wanted. select: the first argument is the data frame; the second argument is the names of the columns we want selected from it. We don’t have to use the names() function, and we don’t even have to use quotation marks. We simply list the column names as objects.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!