I have a data frame with several factor columns containing NaN
\'s that I would like to convert to NA
\'s (the NaN
seems to be a problem for
EDIT:
Gavin Simpson in comments reminds me that, in your situation, there are much easier ways to convert what is really an "NaN" to an "NA":
tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2" "2" "3" "4" "2" "3" "NA"
Solution:
To detect which elements of the character vector are NaN
, you need to convert the vector to a numeric vector:
tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2" "2" "3" "4" "2" "3" "NA"
Explanation:
There are a couple of reasons that this isn't working as you expect it to.
First, although NaN
stands for "Not a Number", it does have class "numeric"
, and only makes sense inside of a numeric vector.
Second, when it is included in a character vector, the symbol NaN
is silently converted to the character string "NaN"
. When you then test it for nan
-ness, the character string returns FALSE
:
class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1" "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE
Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan
will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_
.
> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
> tester1
[1] "2" "2" "3" "4" "2" "3" NA
> is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:
tester1[tester1 == "NaN"] = "NA"
# but that would not really be a missing value either
# and it might screw up a factor variable anyway.
tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2 2 3 4 2 3 <NA>
Levels: 2 3 4 NaN
That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .
You can't have NaN
in a character vector, which is what you have here:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
Notice how R thinks this is a character string.
You can create NaN
in a numeric vector:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1] 2 2 3 4 2 3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Then, of course, R can convert NaN
to NA
as per your code:
> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1] 2 2 3 4 2 3 NA