问题
I found many questions dealing with subsetting by multiple conditions, but just couldn't find how to subset by at least two out of >2 conditions.
This SO question deals with the same problem, but applies the same condition to all columns: Select rows with at least two conditions from all conditions
My question is: How can I subset rows by at least two out of three different conditions?
id<-c(1,2,3,4,5)
V1<-c(2,4,4,9,7)
V2<-c(10,20,20,30,20)
V3<-c(0.7,0.1,0.5,0.2,0.9)
df<-data.frame(cbind(id,V1,V2,V3))
I can subset rows that meet all three out of three conditions by looping through like this:
#empty "subset" data.frame
subdf <- cbind(as.character(),as.numeric(),as.numeric(),as.numeric())
colnames(subdf) <- colnames(df)
for (i in 1:nrow(df)){
if (df$V1[i] <= 4 && df$V2[i] >= 20 && df$V3[i] <= 0.3)
subdf <- rbind(subdf,df[i,])
}
Any ideas on how to subset all rows that fulfill either all three, or any combination of two conditions?
Many thanks in advance!
回答1:
Here's an extension of LukeA's answer there.
dfNew <- df[rowSums(cbind(df$V1 <= 4, df$V2 >= 20, df$V3 <= 0.3)) > 1,]
which returns
dfNew
id V1 V2 V3
2 2 4 20 0.1
3 3 4 20 0.5
4 4 9 30 0.2
The idea is to construct a matrix of the logical vectors with cbind
and then use rowSums
to count the number of TRUE values for each row. The rows of the data.frame can then be subset based on this criterion.
回答2:
I use a trick to do something similar. See if you like it.
The approach is to convert the conditions into text and use eval
id<-c(1,2,3,4,5)
V1<-c(2,4,4,9,7)
V2<-c(10,20,20,30,20)
V3<-c(0.7,0.1,0.5,0.2,0.9)
df<-data.frame(cbind(id,V1,V2,V3))
tests<- c("df$V1 <= 4","df$V2 >= 20" ,"df$V3 <= 0.3")
tests
res<- sapply(tests,FUN = function(txt){eval(parse(text=txt))} )
apply(res,1,sum)
df[apply(res,1,sum) >=2,]
来源:https://stackoverflow.com/questions/41041841/subset-by-at-least-two-out-of-multiple-conditions