How to omit rows with NA in only two columns in R?

后端 未结 4 1082
梦谈多话
梦谈多话 2021-02-04 10:39

I want to omit rows where NA appears in both of two columns.

I\'m familiar with na.omit, is.na, and compl

4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-04 11:19

    Use rowSums with is.na, like this:

    > df[rowSums(is.na(df[c("x", "y")])) != 2, ]
       x y  z
    1  1 4  8
    2  2 5  9
    4  3 6 11
    5 NA 7 NA
    

    Jumping on the benchmarking wagon, and demonstrating what I was referring to about this being a fairly easy-to-generalize solution, consider the following:

    ## Sample data with 10 columns and 1 million rows
    set.seed(123)
    df <- data.frame(replicate(10, sample(c(NA, 1:20), 
                                          1e6, replace = TRUE)))
    

    First, here's what things look like if you're just interested in two columns. Both solutions are pretty legible and short. Speed is quite close.

    f1 <- function() {
      df[!with(df, is.na(X1) & is.na(X2)), ]
    } 
    f2 <- function() {
      df[rowSums(is.na(df[1:2])) != 2, ]
    } 
    
    library(microbenchmark)
    microbenchmark(f1(), f2(), times = 20)
    # Unit: milliseconds
    #  expr      min       lq   median       uq      max neval
    #  f1() 745.8378 1100.764 1128.047 1199.607 1310.236    20
    #  f2() 784.2132 1101.695 1125.380 1163.675 1303.161    20
    

    Next, let's look at the same problem, but this time, we are considering NA values across the first 5 columns. At this point, the rowSums approach is slightly faster and the syntax does not change much.

    f1_5 <- function() {
      df[!with(df, is.na(X1) & is.na(X2) & is.na(X3) &
                 is.na(X4) & is.na(X5)), ]
    } 
    f2_5 <- function() {
      df[rowSums(is.na(df[1:5])) != 5, ]
    } 
    
    microbenchmark(f1_5(), f2_5(), times = 20)
    # Unit: seconds
    #    expr      min       lq   median       uq      max neval
    #  f1_5() 1.275032 1.294777 1.325957 1.368315 1.572772    20
    #  f2_5() 1.088564 1.169976 1.193282 1.225772 1.275915    20
    

提交回复
热议问题