I want to omit rows where NA
appears in both of two columns.
I\'m familiar with na.omit
, is.na
, and compl
You can apply to slice up the rows:
sel <- apply( df, 1, function(x) sum(is.na(x))>1 )
Then you can select with that:
df[ sel, ]
To ignore the z column, just omit it from the apply:
sel <- apply( df[,c("x","y")], 1, function(x) sum(is.na(x))>1 )
If they all have to be TRUE
, just change the function up a little:
sel <- apply( df[,c("x","y")], 1, function(x) all(is.na(x)) )
The other solutions here are more specific to this particular problem, but apply
is worth learning as it solves many other problems. The cost is speed (usual caveats about small datasets and speed testing apply):
> microbenchmark( df[!with(df,is.na(x)& is.na(y)),], df[rowSums(is.na(df[c("x", "y")])) != 2, ], df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ] )
Unit: microseconds
expr min lq median uq max neval
df[!with(df, is.na(x) & is.na(y)), ] 67.148 71.5150 76.0340 86.0155 1049.576 100
df[rowSums(is.na(df[c("x", "y")])) != 2, ] 132.064 139.8760 145.5605 166.6945 498.934 100
df[apply(df, 1, function(x) sum(is.na(x)) > 1), ] 175.372 184.4305 201.6360 218.7150 321.583 100