How to omit rows with NA in only two columns in R?

后端 未结 4 1089
梦谈多话
梦谈多话 2021-02-04 10:39

I want to omit rows where NA appears in both of two columns.

I\'m familiar with na.omit, is.na, and compl

4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-02-04 11:37

    You can apply to slice up the rows:

    sel <- apply( df, 1, function(x) sum(is.na(x))>1 )
    

    Then you can select with that:

    df[ sel, ]
    

    To ignore the z column, just omit it from the apply:

    sel <- apply( df[,c("x","y")], 1, function(x) sum(is.na(x))>1 )
    

    If they all have to be TRUE, just change the function up a little:

    sel <- apply( df[,c("x","y")], 1, function(x) all(is.na(x)) )
    

    The other solutions here are more specific to this particular problem, but apply is worth learning as it solves many other problems. The cost is speed (usual caveats about small datasets and speed testing apply):

    > microbenchmark( df[!with(df,is.na(x)& is.na(y)),], df[rowSums(is.na(df[c("x", "y")])) != 2, ], df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ] )
    Unit: microseconds
                                                  expr     min       lq   median       uq      max neval
                  df[!with(df, is.na(x) & is.na(y)), ]  67.148  71.5150  76.0340  86.0155 1049.576   100
            df[rowSums(is.na(df[c("x", "y")])) != 2, ] 132.064 139.8760 145.5605 166.6945  498.934   100
     df[apply(df, 1, function(x) sum(is.na(x)) > 1), ] 175.372 184.4305 201.6360 218.7150  321.583   100
    

提交回复
热议问题