I\'d like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some o
This is the fastest way that I know of:
unlist(lapply(df, function(x) any(is.na(x))))
EDIT:
I guess everyone else wrote it out complete so here it is complete:
nacols <- function(df) {
colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
}
And if you microbenchmark the 4 solutions on a WIN 7 machine:
Unit: microseconds
expr min lq median uq max
1 ANDRIE 85.380 91.911 106.375 116.639 863.124
2 MANOEL 87.712 93.778 105.908 118.971 8426.886
3 MOIRA 764.215 798.273 817.402 876.188 143039.632
4 TYLER 51.321 57.853 62.518 72.316 1365.136
And here's a visual of that:
Edit At the time I wrote this anyNA
did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA
:
The generic function
anyNA
implementsany(is.na(x))
in a possibly faster way (especially for atomic vectors).
nacols <- function(df) {
colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
}
Here is one way:
colnames(tmp)[colSums(is.na(tmp)) > 0]
Hope it helps,
Manoel
One way...
nacols <- function(x){
y <- sapply(x, function(xx)any(is.na(xx)))
names(y[y])
}
nacols(tmp)
[1] "y" "z"
Explanation: since the result y
is a logical vector, names(y[y])
returns the names of y for only those cases where y
is TRUE.