Show columns with NAs in a data.frame

后端 未结 3 1792
甜味超标
甜味超标 2021-02-01 05:57

I\'d like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some o

3条回答
  •  被撕碎了的回忆
    2021-02-01 06:24

    This is the fastest way that I know of:

    unlist(lapply(df, function(x) any(is.na(x))))
    

    EDIT:

    I guess everyone else wrote it out complete so here it is complete:

    nacols <- function(df) {
        colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
    }
    

    And if you microbenchmark the 4 solutions on a WIN 7 machine:

    Unit: microseconds
        expr     min      lq  median      uq        max
    1 ANDRIE  85.380  91.911 106.375 116.639    863.124
    2 MANOEL  87.712  93.778 105.908 118.971   8426.886
    3  MOIRA 764.215 798.273 817.402 876.188 143039.632
    4  TYLER  51.321  57.853  62.518  72.316   1365.136
    

    And here's a visual of that: enter image description here

    Edit At the time I wrote this anyNA did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA:

    The generic function anyNA implements any(is.na(x)) in a possibly faster way (especially for atomic vectors).

    nacols <- function(df) {
        colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
    }
    

提交回复
热议问题