问题
Usually, if I want to subset a dataframe conditioning of some values a variable I'm using subset and %in%:
x <- data.frame(u=1:10,v=LETTERS[1:10])
x
subset(x, v %in% c("A","D"))
Now, I found out that also == gives the same result:
subset(x, v == c("A","D"))
I'm just wondering if they are identically or if there is a reason to prefere one over the other. Thanks for help.
Edit (@MrFlick): This question asks not the same as this here which asks how to not include several values: (!x %in% c('a','b'))
. I asked why I got the same if I use ==
or %in%
.
回答1:
You should use the first one %in%
because you got the result only because in the example dataset, it was in the order of recycling of A
, D
. Here, it is comparing
rep(c("A", "D"), length.out= nrow(x))
# 1] "A" "D" "A" "D" "A" "D" "A" "D" "A" "D"
x$v==rep(c("A", "D"), length.out= nrow(x))# only because of coincidence
#[1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
subset(x, v == c("D","A"))
#[1] u v
#<0 rows> (or 0-length row.names)
while in the above
x$v==rep(c("D", "A"), length.out= nrow(x))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
whereas %in%
works
subset(x, v %in% c("D","A"))
# u v
#1 1 A
#4 4 D
来源:https://stackoverflow.com/questions/26805267/r-subset-with-condition-using-in-or-which-one-should-be-used