How do I replace NA values with zeros in an R dataframe?

后端 未结 20 1869
说谎
说谎 2020-11-21 23:41

I have a data frame and some columns have NA values.

How do I replace these NA values with zeroes?

相关标签:
20条回答
  • 2020-11-22 00:07

    An easy way to write it is with if_na from hablar:

    library(dplyr)
    library(hablar)
    
    df <- tibble(a = c(1, 2, 3, NA, 5, 6, 8))
    
    df %>% 
      mutate(a = if_na(a, 0))
    

    which returns:

          a
      <dbl>
    1     1
    2     2
    3     3
    4     0
    5     5
    6     6
    7     8
    
    0 讨论(0)
  • 2020-11-22 00:08

    Dedicated functions, nafill and setnafill, for that purpose is in data.table. Whenever available, they distribute columns to be computed on multiple threads.

    library(data.table)
    
    ans_df <- nafill(df, fill=0)
    
    # or even faster, in-place
    setnafill(df, fill=0)
    
    0 讨论(0)
  • 2020-11-22 00:09

    See my comment in @gsk3 answer. A simple example:

    > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
    > d <- as.data.frame(m)
       V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
    1   4  3 NA  3  7  6  6 10  6   5
    2   9  8  9  5 10 NA  2  1  7   2
    3   1  1  6  3  6 NA  1  4  1   6
    4  NA  4 NA  7 10  2 NA  4  1   8
    5   1  2  4 NA  2  6  2  6  7   4
    6  NA  3 NA NA 10  2  1 10  8   4
    7   4  4  9 10  9  8  9  4 10  NA
    8   5  8  3  2  1  4  5  9  4   7
    9   3  9 10  1  9  9 10  5  3   3
    10  4  2  2  5 NA  9  7  2  5   5
    
    > d[is.na(d)] <- 0
    
    > d
       V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
    1   4  3  0  3  7  6  6 10  6   5
    2   9  8  9  5 10  0  2  1  7   2
    3   1  1  6  3  6  0  1  4  1   6
    4   0  4  0  7 10  2  0  4  1   8
    5   1  2  4  0  2  6  2  6  7   4
    6   0  3  0  0 10  2  1 10  8   4
    7   4  4  9 10  9  8  9  4 10   0
    8   5  8  3  2  1  4  5  9  4   7
    9   3  9 10  1  9  9 10  5  3   3
    10  4  2  2  5  0  9  7  2  5   5
    

    There's no need to apply apply. =)

    EDIT

    You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

    0 讨论(0)
  • 2020-11-22 00:12

    I know the question is already answered, but doing it this way might be more useful to some:

    Define this function:

    na.zero <- function (x) {
        x[is.na(x)] <- 0
        return(x)
    }
    

    Now whenever you need to convert NA's in a vector to zero's you can do:

    na.zero(some.vector)
    
    0 讨论(0)
  • 2020-11-22 00:12

    Would've commented on @ianmunoz's post but I don't have enough reputation. You can combine dplyr's mutate_each and replace to take care of the NA to 0 replacement. Using the dataframe from @aL3xa's answer...

    > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
    > d <- as.data.frame(m)
    > d
    
        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
    1   4  8  1  9  6  9 NA  8  9   8
    2   8  3  6  8  2  1 NA NA  6   3
    3   6  6  3 NA  2 NA NA  5  7   7
    4  10  6  1  1  7  9  1 10  3  10
    5  10  6  7 10 10  3  2  5  4   6
    6   2  4  1  5  7 NA NA  8  4   4
    7   7  2  3  1  4 10 NA  8  7   7
    8   9  5  8 10  5  3  5  8  3   2
    9   9  1  8  7  6  5 NA NA  6   7
    10  6 10  8  7  1  1  2  2  5   7
    
    > d %>% mutate_each( funs_( interp( ~replace(., is.na(.),0) ) ) )
    
        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
    1   4  8  1  9  6  9  0  8  9   8
    2   8  3  6  8  2  1  0  0  6   3
    3   6  6  3  0  2  0  0  5  7   7
    4  10  6  1  1  7  9  1 10  3  10
    5  10  6  7 10 10  3  2  5  4   6
    6   2  4  1  5  7  0  0  8  4   4
    7   7  2  3  1  4 10  0  8  7   7
    8   9  5  8 10  5  3  5  8  3   2
    9   9  1  8  7  6  5  0  0  6   7
    10  6 10  8  7  1  1  2  2  5   7
    

    We're using standard evaluation (SE) here which is why we need the underscore on "funs_." We also use lazyeval's interp/~ and the . references "everything we are working with", i.e. the data frame. Now there are zeros!

    0 讨论(0)
  • 2020-11-22 00:15

    More general approach of using replace() in matrix or vector to replace NA to 0

    For example:

    > x <- c(1,2,NA,NA,1,1)
    > x1 <- replace(x,is.na(x),0)
    > x1
    [1] 1 2 0 0 1 1
    

    This is also an alternative to using ifelse() in dplyr

    df = data.frame(col = c(1,2,NA,NA,1,1))
    df <- df %>%
       mutate(col = replace(col,is.na(col),0))
    
    0 讨论(0)
提交回复
热议问题