Show columns with NAs in a data.frame

后端 未结 3 1798
甜味超标
甜味超标 2021-02-01 05:57

I\'d like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some o

相关标签:
3条回答
  • 2021-02-01 06:24

    This is the fastest way that I know of:

    unlist(lapply(df, function(x) any(is.na(x))))
    

    EDIT:

    I guess everyone else wrote it out complete so here it is complete:

    nacols <- function(df) {
        colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
    }
    

    And if you microbenchmark the 4 solutions on a WIN 7 machine:

    Unit: microseconds
        expr     min      lq  median      uq        max
    1 ANDRIE  85.380  91.911 106.375 116.639    863.124
    2 MANOEL  87.712  93.778 105.908 118.971   8426.886
    3  MOIRA 764.215 798.273 817.402 876.188 143039.632
    4  TYLER  51.321  57.853  62.518  72.316   1365.136
    

    And here's a visual of that: enter image description here

    Edit At the time I wrote this anyNA did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA:

    The generic function anyNA implements any(is.na(x)) in a possibly faster way (especially for atomic vectors).

    nacols <- function(df) {
        colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
    }
    
    0 讨论(0)
  • 2021-02-01 06:38

    Here is one way:

    colnames(tmp)[colSums(is.na(tmp)) > 0]
    

    Hope it helps,

    Manoel

    0 讨论(0)
  • 2021-02-01 06:40

    One way...

    nacols <- function(x){
      y <- sapply(x, function(xx)any(is.na(xx)))
      names(y[y])
    }  
    
    nacols(tmp)
    [1] "y" "z"
    

    Explanation: since the result y is a logical vector, names(y[y]) returns the names of y for only those cases where y is TRUE.

    0 讨论(0)
提交回复
热议问题