Set NA to 0 in R

后端 未结 4 1657
心在旅途
心在旅途 2020-11-29 03:17

After merging a dataframe with another im left with random NA\'s for the occasional row. I\'d like to set these NA\'s to 0 so I can perform calculations with them.

相关标签:
4条回答
  • 2020-11-29 03:33

    Why not try this

      na.zero <- function (x) {
            x[is.na(x)] <- 0
            return(x)
        }
        na.zero(df)
    
    0 讨论(0)
  • 2020-11-29 03:37

    You can just use the output of is.na to replace directly with subsetting:

    bothbeams.data[is.na(bothbeams.data)] <- 0
    

    Or with a reproducible example:

    dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
    dfr[is.na(dfr)] <- 0
    dfr
      x y
    1 1 0
    2 2 4
    3 3 5
    4 0 6
    

    However, be careful using this method on a data frame containing factors that also have missing values:

    > d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
    > d[is.na(d)] <- 0
    Warning message:
    In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
      invalid factor level, NA generated
    

    It "works":

    > d
      x    y
    1 0    a
    2 2 <NA>
    3 3    c
    

    ...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.

    0 讨论(0)
  • 2020-11-29 03:42

    A solution using mutate_all from dplyr in case you want to add that to your dplyr pipeline:

    library(dplyr)
    df %>%
      mutate_all(funs(ifelse(is.na(.), 0, .)))
    

    Result:

       A B C
    1  1 1 2
    2  2 2 5
    3  3 1 2
    4  0 2 0
    5  1 1 0
    6  2 2 0
    7  3 1 3
    8  0 2 0
    9  1 1 3
    10 2 2 3
    11 3 1 0
    12 0 2 3
    13 1 1 4
    14 2 2 4
    15 3 1 0
    16 0 2 0
    17 1 1 1
    18 2 2 0
    19 3 1 2
    20 0 2 0
    

    If in any case you only want to replace the NA's in numeric columns, which I assume it might be the case in modeling, you can use mutate_if:

    library(dplyr)
    df %>%
      mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))
    

    or in base R:

    replace(is.na(df), 0)
    

    Result:

       A  B C
    1  1  0 2
    2  2 NA 5
    3  3  0 2
    4  0 NA 0
    5  1  0 0
    6  2 NA 0
    7  3  0 3
    8  0 NA 0
    9  1  0 3
    10 2 NA 3
    11 3  0 0
    12 0 NA 3
    13 1  0 4
    14 2 NA 4
    15 3  0 0
    16 0 NA 0
    17 1  0 1
    18 2 NA 0
    19 3  0 2
    20 0 NA 0
    

    Data:

    set.seed(123)
    df <- data.frame(A=rep(c(0:3, NA), 5), B=rep(c("0", "NA"), 10), C=c(sample(c(0:5, NA), 20, replace = TRUE)))
    
    0 讨论(0)
  • 2020-11-29 03:44

    To add to James's example, it seems you always have to create an intermediate when performing calculations on NA-containing data frames.

    For instance, adding two columns (A and B) together from a data frame dfr:

    temp.df <- data.frame(dfr) # copy the original
    temp.df[is.na(temp.df)] <- 0
    dfr$C <- temp.df$A + temp.df$B # or any other calculation
    remove('temp.df')
    

    When I do this I throw away the intermediate afterwards with remove/rm.

    0 讨论(0)
提交回复
热议问题