R can't convert NaN to NA

后端 未结 3 1345
心在旅途
心在旅途 2021-02-15 16:08

I have a data frame with several factor columns containing NaN\'s that I would like to convert to NA\'s (the NaN seems to be a problem for

相关标签:
3条回答
  • 2021-02-15 16:13

    EDIT:

    Gavin Simpson in comments reminds me that, in your situation, there are much easier ways to convert what is really an "NaN" to an "NA":

    tester1 <- gsub("NaN", "NA", tester1)
    tester1
    # [1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"
    

    Solution:

    To detect which elements of the character vector are NaN, you need to convert the vector to a numeric vector:

    tester1[is.nan(as.numeric(tester1))] <- "NA"
    tester1
    [1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"
    

    Explanation:

    There are a couple of reasons that this isn't working as you expect it to.

    First, although NaN stands for "Not a Number", it does have class "numeric", and only makes sense inside of a numeric vector.

    Second, when it is included in a character vector, the symbol NaN is silently converted to the character string "NaN". When you then test it for nan-ness, the character string returns FALSE:

    class(NaN)
    # [1] "numeric"
    c("1", NaN)
    # [1] "1"   "NaN"
    is.nan(c("1", NaN))
    # [1] FALSE FALSE
    
    0 讨论(0)
  • 2021-02-15 16:28

    Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_.

    > tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
    >  tester1
    [1] "2" "2" "3" "4" "2" "3" NA 
    >  is.na(tester1)
    [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
    

    Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:

    tester1[tester1 == "NaN"] = "NA"  
    # but that would not really be a missing value either 
    # and it might screw up a factor variable anyway.
    
    tester1[tester1=="NaN"] <- "NA"
    Warning message:
    In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
    invalid factor level, NAs generated
    ##########
    tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
    
    > tester1[tester1 =="NaN"] <- NA_character_
    > tester1
    [1] 2    2    3    4    2    3    <NA>
    Levels: 2 3 4 NaN
    

    That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .

    0 讨论(0)
  • 2021-02-15 16:33

    You can't have NaN in a character vector, which is what you have here:

    > tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
    > is.nan(tester1)
    [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
    > tester1
    [1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
    

    Notice how R thinks this is a character string.

    You can create NaN in a numeric vector:

    > tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
    > as.numeric(tester1)
    [1]   2   2   3   4   2   3 NaN
    > is.nan(as.numeric(tester1))
    [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
    

    Then, of course, R can convert NaN to NA as per your code:

    > foo <- as.numeric(tester1)
    > foo[is.nan(foo)] <- NA
    > foo
    [1]  2  2  3  4  2  3 NA
    
    0 讨论(0)
提交回复
热议问题