问题
I have a dataframe in which one column contains numeric vectors. I want to filter rows based on a condition involving that column. This is a simplified example.
df <- data.frame(id = LETTERS[1:3], name=c("Alice", "Bob", "Carol"))
mylist=list(c(1,2,3), c(4,5), c(1,3,4))
df$numvecs <- mylist
df
# id name numvecs
# 1 A Alice 1, 2, 3
# 2 B Bob 4, 5
# 3 C Carol 1, 3, 4
I can use something like mapply e.g.
mapply(function(x,y) x=="B" & 4 %in% y, df$id, df$numvecs)
which correctly returns TRUE for the second row, and FALSE for rows 1 and 2.
However, I have reasons why I'd like to use dplyr filter instead of mapply, but I can't get dplyr filter to operate correctly on the numvecs column. Instead of returning two rows, the following returns no rows.
filter(df, 4 %in% numvecs)
# [1] id numvecs
# <0 rows> (or 0-length row.names)
What am I missing here? How can I filter on a conditional expression involving the numvecs column?
And ideally I'd like to use the non-standard evaluation filter_ as well, so I can pass the filter condition as an argument. Any help appreciated. Thanks.
回答1:
We can still use mapply
with filter
filter(df, mapply(function(x,y) x == "B" & 4 %in% y, id, numvecs))
# id name numvecs
#1 B Bob 4, 5
Or use map
from purrr
library(purrr)
filter(df, unlist(map(numvecs, ~4 %in% .x)))
# id name numvecs
#1 B Bob 4, 5
#2 C Carol 1, 3, 4
Or we can also do this in chain
df %>%
.$numvecs %>%
map( ~ 4 %in% .x) %>%
unlist %>%
df[.,]
# id name numvecs
#2 B Bob 4, 5
#3 C Carol 1, 3, 4
回答2:
You can use sapply
on the numvecs
column and create a logic vector for subsetting:
library(dplyr)
filter(df, sapply(numvecs, function(vec) 4 %in% vec), id == "B")
# id name numvecs
# 1 B Bob 4, 5
filter(df, sapply(numvecs, function(vec) 4 %in% vec))
# id name numvecs
# 1 B Bob 4, 5
# 2 C Carol 1, 3, 4
来源:https://stackoverflow.com/questions/38677497/r-dplyr-filter-a-dataframe-that-contains-a-column-of-numeric-vectors