Show columns with NAs in a data.frame

后端未结

关注

 3  1798

I\'d like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some o

相关标签:

3条回答

被撕碎了的回忆

2021-02-01 06:24
This is the fastest way that I know of:
```
unlist(lapply(df, function(x) any(is.na(x))))
```
EDIT:

I guess everyone else wrote it out complete so here it is complete:
```
nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
}
```
And if you microbenchmark the 4 solutions on a WIN 7 machine:
```
Unit: microseconds
    expr     min      lq  median      uq        max
1 ANDRIE  85.380  91.911 106.375 116.639    863.124
2 MANOEL  87.712  93.778 105.908 118.971   8426.886
3  MOIRA 764.215 798.273 817.402 876.188 143039.632
4  TYLER  51.321  57.853  62.518  72.316   1365.136
```
And here's a visual of that:

Edit At the time I wrote this anyNA did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA:

The generic function anyNA implements any(is.na(x)) in a possibly faster way (especially for atomic vectors).
```
nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2021-02-01 06:38
Here is one way:
```
colnames(tmp)[colSums(is.na(tmp)) > 0]
```
Hope it helps,

Manoel
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2021-02-01 06:40
One way...
```
nacols <- function(x){
  y <- sapply(x, function(xx)any(is.na(xx)))
  names(y[y])
}  

nacols(tmp)
[1] "y" "z"
```
Explanation: since the result y is a logical vector, names(y[y]) returns the names of y for only those cases where y is TRUE.
0 讨论(0)
发布评论:

提交评论
- 加载中...