I have been using the R which
function to remove rows from a data frame. I recently discovered that if the search term is NOT in the data.frame, the result is an empty character.
# 1: returns A-Q, S-Z (as expected)
LETTERS[-which(LETTERS == "R")]
# 2: returns "character(0)" (not what I would expect)
LETTERS[-which(LETTERS == "1")]
# 3: returns A-Z (expected)
LETTERS[which(LETTERS != "1")]
# 4: returns A-Q, S-Z (expected)
LETTERS[which(LETTERS != "R")]
Is the second example the expected behavior for -which()
when the search term is not found? I have already switched my code to use the syntax in example 4, which seems safer, but I am just curious.
That is a well-known pitfall. When nothing matches the logical test the which-function returns numeric(0) and then "[" returns nothing instead of returning everything which would be expected. You can use:
LETTERS[ ! LETTERS == "1" ]
LETTERS[ ! LETTERS %in% "1" ]
There is another gotcha to be aware of and is the one that makes me choose to use which(). When using logical indexing an NA value used inside "[" will return a row. I generally do not want that so I use DFRM[ which(logical) ]
although this seems to bother some people who say is is not needed. I just think they are working with small datasets and infrequently encounter the annoyance of seeing tens of thousands of NA-induced useless lines of output on their console. I never use the negated which version though.
Because of this:
which(LETTERS == '-1')
## integer(0)
and this:
(1:2)[integer(0)]
integer(0)
Instead of #4, use this:
LETTERS[LETTERS != "R"]
In example 2, which
returns integer(0)
(a zero-length integer vector) because no values are TRUE
. A negative zero-length vector (-integer(0)
) is still a zero-length vector. So you're essentially asking for the NULL
element of LETTERS
, which doesn't exist.
来源:https://stackoverflow.com/questions/16047825/unexpected-behavior-using-which-in-r-when-the-search-term-is-not-found