How to omit rows with NA in only two columns in R?

后端 未结 4 1077
梦谈多话
梦谈多话 2021-02-04 10:39

I want to omit rows where NA appears in both of two columns.

I\'m familiar with na.omit, is.na, and compl

4条回答
  •  囚心锁ツ
    2021-02-04 11:17

    df[!with(df,is.na(x)& is.na(y)),]
    #      x y  z
    #1  1 4  8
    #2  2 5  9
    #4  3 6 11
    #5 NA 7 NA
    

    I did benchmarked on a slightly bigger dataset. Here are the results:

    set.seed(237)
    df <- data.frame(x=sample(c(NA,1:20), 1e6, replace=T), y= sample(c(NA, 1:10), 1e6, replace=T), z= sample(c(NA, 5:15), 1e6,replace=T)) 
    
    f1 <- function() df[!with(df,is.na(x)& is.na(y)),]
    f2 <- function() df[rowSums(is.na(df[c("x", "y")])) != 2, ]
    f3 <- function()  df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ] 
    
    library(microbenchmark)
    
    microbenchmark(f1(), f2(), f3(), unit="relative")
    Unit: relative
    #expr       min        lq    median        uq       max neval
    # f1()  1.000000  1.000000  1.000000  1.000000  1.000000   100
    # f2()  1.044812  1.068189  1.138323  1.129611  0.856396   100
    # f3() 26.205272 25.848441 24.357665 21.799930 22.881378   100
    

提交回复
热议问题