This question already has an answer here:
I have a dataframe, dat:
dat<-data.frame(col1=rep(1:4,3),
col2=rep(letters[24:26],4),
col3=letters[1:12])
I want to filter dat
on two different columns using ONLY the combinations given by the rows in the data frame filter
:
filter<-data.frame(col1=1:3,col2=NA)
lists<-list(list("x","y"),list("y","z"),list("x","z"))
filter$col2<-lists
So for example, rows containing (1,x) and (1,y), would be selected, but not (1,z),(2,x), or (3,y).
I know how I would do it using a for loop:
#create a frame to drop results in
results<-dat[1,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-dat[dat$col1==temp_filter[1,1] &
dat$col2%in%unlist(temp_filter[1,2]),]
results<-rbind(results,temp_dat)
}
Or if you prefer dplyr style:
require(dplyr)
results<-dat[0,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-filter(dat,col1==temp_filter[1,1] &
col2%in%unlist(temp_filter[1,2])
results<-rbind(results,temp_dat)
}
results should return
col1 col2 col3
1 1 x a
5 1 y e
2 2 y b
6 2 z f
3 3 z c
7 3 x g
I would normally do the filtering using a merge, but I can't now since I have to check col2 against a list rather than a single value. The for loop works but I figured there would be a more efficient way to do this, probably using some variation of apply
or do.call
.
We could use dplyr::anti_join()
to do the row exclusion filtering for us, if we had two dataframes:
index <- data.frame(col1 = as.character(filter[,1]),
col2 = filter[,2])
anti_join(dat, index)
Joining, by = c("col1", "col2")
col1 col2 col3
1 4 x d
2 1 y e
3 2 z f
4 3 x g
5 4 y h
6 1 z i
7 2 x j
8 3 y k
9 4 z l
mostly base with a little help from dplyr
:
dplyr::setdiff(dat,merge(dat,setNames(as.data.frame(filter),names(dat)[1:2])))
col1 col2 col3
1 4 x d
2 1 y e
3 2 z f
4 3 x g
5 4 y h
6 1 z i
7 2 x j
8 3 y k
9 4 z l
A real base R solution though not so pretty and you lose the row order:
subset(merge(dat,`[[<-`(setNames(as.data.frame(filter),names(dat)[1:2]),"x",value=1),all.x=T),is.na(x),-4)
col1 col2 col3
2 1 y e
3 1 z i
4 2 x j
6 2 z f
7 3 x g
8 3 y k
10 4 x d
11 4 y h
12 4 z l
来源:https://stackoverflow.com/questions/46354932/selecting-rows-from-a-data-frame-from-combinations-of-lists