Subset multiple columns in R with multiple matches

只愿长相守 提交于 2021-01-29 14:39:08

问题


I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?

I have data that looks like this:

df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"

What I want to do is to subset the data where criteria is meet in at least two columns, that is the string in at least two of the three columns is A. In the case above, the subset would be the first and last row of the data frame df.


回答1:


You can use rowSums :

df[rowSums(df[-1] == criteria) >= 2, ]

#  x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

If criteria is of length > 1 you cannot use == directly in which case use sapply with %in%.

df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]

In dplyr you can use filter with rowwise :

library(dplyr)
df %>%
  rowwise() %>%
  filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)



回答2:


We can use subset with apply

subset(df, apply(df[-1] == criteria, 1, sum) >1)
#   x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A


来源:https://stackoverflow.com/questions/64259979/subset-multiple-columns-in-r-with-multiple-matches

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!