Find and replace missing values with row mean

后端 未结 5 694
夕颜
夕颜 2020-12-16 18:26

I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

>          


        
相关标签:
5条回答
  • 2020-12-16 19:09

    Very similar to @baptiste's answer

    > ind <- which(is.na(df), arr.ind=TRUE)
    > df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]
    
    0 讨论(0)
  • 2020-12-16 19:10

    I think this works,

    df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)
    
    0 讨论(0)
  • 2020-12-16 19:18

    Using apply (note the returned object is a matrix):

    t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
         c1 c2 c3
    [1,]  1  3  2
    [2,]  2  1  1
    [3,]  3  3  3
    [4,]  2  3  1
    

    We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).

    0 讨论(0)
  • 2020-12-16 19:24

    My solution is

    rwmns = rowMeans(df,na.rm=TRUE)
    df$c1[is.na(df$c1)] = rwmns[is.na(df$c1)]
    df$c2[is.na(df$c2)] = rwmns[is.na(df$c2)]
    df$c3[is.na(df$c3)] = rwmns[is.na(df$c3)]
    > df
      c1 c2 c3
    1  1  3  2
    2  2  1  1
    3  3  3  3
    4  2  3  1
    

    Is there a more elegant way, especially when someone has many columns?

    0 讨论(0)
  • 2020-12-16 19:28

    Another option is na.aggregate from library(zoo) after transposing the dataset

    library(zoo)
    df[] <- t(na.aggregate(t(df)))
    df
    #  c1 c2 c3
    #1  1  3  2
    #2  2  1  1
    #3  3  3  3
    #4  2  3  1
    
    0 讨论(0)
提交回复
热议问题