问题
I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?
I have data that looks like this:
df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"
What I want to do is to subset
the data where criteria
is meet in at least two columns, that is the string
in at least two of the three columns is A
. In the case above, the subset
would be the first and last row of the data frame df
.
回答1:
You can use rowSums
:
df[rowSums(df[-1] == criteria) >= 2, ]
# x Col1 Col2 Col3
#1 1 A A A
#4 4 B A A
If criteria
is of length > 1 you cannot use ==
directly in which case use sapply
with %in%
.
df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]
In dplyr
you can use filter
with rowwise
:
library(dplyr)
df %>%
rowwise() %>%
filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)
回答2:
We can use subset
with apply
subset(df, apply(df[-1] == criteria, 1, sum) >1)
# x Col1 Col2 Col3
#1 1 A A A
#4 4 B A A
来源:https://stackoverflow.com/questions/64259979/subset-multiple-columns-in-r-with-multiple-matches