Subset dataframe based on number of observations in each column

后端 未结 1 933
北海茫月
北海茫月 2020-12-12 01:06

I have one problem would you like to give me a hand. I tried to come up with solution, but I do not have any idea how to work it out.

Please use this to recreate my

相关标签:
1条回答
  • 2020-12-12 01:29

    Try

    df1[, colSums(!is.na(df1)) >= 7]
    #   A1 A3
    #1  87 NA
    #2  67 38
    #3  80 10
    #4  36 41
    #5  71 NA
    #6   6 66
    #7  26 NA
    #8  15  7
    #9  14 29
    #10 46 NA
    #11 19 70
    #12 93 23
    #13  5 46
    #14 94 55
    

    step by step

    What you need to do first is to find out which values of your data are not missing.

    !is.na(df1)
    

    This returns a logical matrix

    #        A1    A2    A3
    # [1,] TRUE  TRUE FALSE
    # [2,] TRUE FALSE  TRUE
    # [3,] TRUE  TRUE  TRUE
    # [4,] TRUE  TRUE  TRUE
    # [5,] TRUE  TRUE FALSE
    # [6,] TRUE  TRUE  TRUE
    # [7,] TRUE  TRUE FALSE
    # [8,] TRUE FALSE  TRUE
    # [9,] TRUE FALSE  TRUE
    #[10,] TRUE FALSE FALSE
    #[11,] TRUE FALSE  TRUE
    #[12,] TRUE FALSE  TRUE
    #[13,] TRUE FALSE  TRUE
    #[14,] TRUE FALSE  TRUE
    

    Use colSums to find out how many observations per column are not missing

    colSums(!is.na(df1))
    #A1 A2 A3 
    #14  6 10
    

    Apply you condition "greater or equal of 7 observations(count) per columns"

    colSums(!is.na(df1)) >= 7
    #   A1    A2    A3 
    # TRUE FALSE  TRUE
    

    Finally, you need to use this vector to subset your data

    df1[, colSums(!is.na(df1)) >= 7]
    

    Turn this into a function if you need it regulary

    almost_complete_cols <- function(data, min_obs) {
      data[, colSums(!is.na(data)) >= min_obs, drop = FALSE]
    }
    
    almost_complete_cols(df1, 7)
    
    0 讨论(0)
提交回复
热议问题