means and SD for columns in a dataframe with NA values

前端 未结 3 1604
庸人自扰
庸人自扰 2021-01-20 01:58

I\'m trying to calculate the mean and standard deviation of several columns (except the first column) in a data.frame with NA values.

I\'ve tried

相关标签:
3条回答
  • 2021-01-20 01:59
    sapply(df, function(cl) list(means=mean(cl,na.rm=TRUE), sds=sd(cl,na.rm=TRUE)))
          col1     col2     col3     col4     col5    
    means 3        8        12.5     18.25    22.5    
    sds   1.581139 1.581139 1.290994 1.707825 1.290994
    
    as.data.frame( t(sapply(df, function(cl) list(means=mean(cl,na.rm=TRUE), 
                                                  sds=sd(cl,na.rm=TRUE))) ))
         means      sds
    col1     3 1.581139
    col2     8 1.581139
    col3  12.5 1.290994
    col4 18.25 1.707825
    col5  22.5 1.290994
    
    0 讨论(0)
  • 2021-01-20 02:25

    The functions you should be using (e.g. colMeans) will almost all have a parameter called na.rm which defaults to FALSE. Just do colMeans(x = your_df, na.rm = TRUE) and you'll be good to go. Same with using just mean() if you want to go column by column.

    0 讨论(0)
  • 2021-01-20 02:25

    The following example code may prove useful.

    # Create a 5 column dataframe that contains some NAs
    col1 <- c(1,2,3,4,5)
    col2 <- c(6,7,8,9,10)
    col3 <- c(11,12,13,14,NA)
    col4 <- c(16,NA,18,19,20)
    col5 <- c(21,22,23,24,NA)
    dataframe <- data.frame(col1,col2,col3,col4,col5)
    
    # Apply the mean() function to all but the first column of the dataframe
    apply(dataframe[,2:ncol(dataframe)], 2, function(x) mean(x, na.rm=TRUE))
    
    # Check that the returned values are correct:
    mean(col2)
    mean(col3, na.rm=TRUE)
    mean(col4, na.rm=TRUE)
    mean(col5, na.rm=TRUE)
    

    For the standard deviation, replace mean() with sd().

    0 讨论(0)
提交回复
热议问题