Selecting rows from a data frame from combinations of lists [duplicate]

雨燕双飞 提交于 2019-12-02 19:32:32

问题


I have a dataframe, dat:

dat<-data.frame(col1=rep(1:4,3),
                col2=rep(letters[24:26],4),
                col3=letters[1:12])

I want to filter dat on two different columns using ONLY the combinations given by the rows in the data frame filter:

filter<-data.frame(col1=1:3,col2=NA)
lists<-list(list("x","y"),list("y","z"),list("x","z"))
filter$col2<-lists

So for example, rows containing (1,x) and (1,y), would be selected, but not (1,z),(2,x), or (3,y).

I know how I would do it using a for loop:

#create a frame to drop results in
results<-dat[1,]
for(f in 1:nrow(filter)){
  temp_filter<-filter[f,]
  temp_dat<-dat[dat$col1==temp_filter[1,1] &
                dat$col2%in%unlist(temp_filter[1,2]),]
  results<-rbind(results,temp_dat)
}

Or if you prefer dplyr style:

require(dplyr)
results<-dat[0,]
for(f in 1:nrow(filter)){
  temp_filter<-filter[f,]
  temp_dat<-filter(dat,col1==temp_filter[1,1] & 
  col2%in%unlist(temp_filter[1,2])
  results<-rbind(results,temp_dat)
}

results should return

  col1 col2 col3
1    1    x    a
5    1    y    e
2    2    y    b
6    2    z    f
3    3    z    c
7    3    x    g

I would normally do the filtering using a merge, but I can't now since I have to check col2 against a list rather than a single value. The for loop works but I figured there would be a more efficient way to do this, probably using some variation of apply or do.call.


回答1:


We could use dplyr::anti_join() to do the row exclusion filtering for us, if we had two dataframes:

index <- data.frame(col1 = as.character(filter[,1]),
                    col2 = filter[,2])

anti_join(dat, index)

Joining, by = c("col1", "col2")
  col1 col2 col3
1    4    x    d
2    1    y    e
3    2    z    f
4    3    x    g
5    4    y    h
6    1    z    i
7    2    x    j
8    3    y    k
9    4    z    l



回答2:


mostly base with a little help from dplyr:

dplyr::setdiff(dat,merge(dat,setNames(as.data.frame(filter),names(dat)[1:2])))

  col1 col2 col3
1    4    x    d
2    1    y    e
3    2    z    f
4    3    x    g
5    4    y    h
6    1    z    i
7    2    x    j
8    3    y    k
9    4    z    l

A real base R solution though not so pretty and you lose the row order:

subset(merge(dat,`[[<-`(setNames(as.data.frame(filter),names(dat)[1:2]),"x",value=1),all.x=T),is.na(x),-4)

   col1 col2 col3
2     1    y    e
3     1    z    i
4     2    x    j
6     2    z    f
7     3    x    g
8     3    y    k
10    4    x    d
11    4    y    h
12    4    z    l


来源:https://stackoverflow.com/questions/46354932/selecting-rows-from-a-data-frame-from-combinations-of-lists

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!