How should I drop blocks of NAs from an R data.table

后端 未结 2 1008
-上瘾入骨i
-上瘾入骨i 2021-02-05 19:37

I have a large R data.table with a multi column key, where some value columns contain some NA. I\'d like to remove groups that are entirely NA in one or more value

相关标签:
2条回答
  • 2021-02-05 20:09

    You can do this to get those entries where not ALL Value are NA:

    setkey(DT, "Series")
    DT[, .SD[(!all(is.na(Value)))], by=Series]
    

    The parens around !all are needed to avoid not-join syntax which Matthew will look into (see comments). Same as this :

    DT[, .SD[as.logical(!all(is.na(Value)))], by=Series]
    

    Building on that to answer the new clarified question :

    allNA = function(x) all(is.na(x))     # define helper function
    for (i in c("Id","Series"))
        DT = DT[, if (!any(sapply(.SD,allNA))) .SD else NULL, by=i]
    DT
        Series Id Value1 Value2
     1:      i  1      1      1
     2:      i  2      2      2
     3:      i  3      3      3
     4:      b  5      5      2
     5:      b  6      6      3
     6:      f  5      5      2
     7:      f  6      6      3
     8:      j  5      5      5
     9:      j  6      6      6
    10:      c  7      7      4
    11:      c  8      8     NA
    12:      c  9      9      6
    13:      g  7      7      4
    14:      g  8      8      5
    15:      g  9      9      6
    16:      k  7      7      7
    17:      k  8      8      8
    18:      k  9      9      9
    

    That changes the order, though. So isn't precisely the result requested. The following keeps the order and should be faster too.

    # starting fresh from original DT in question again
    DT[,drop:=FALSE]
    for (i in c("Series","Id"))
        DT[,drop:=drop|any(sapply(.SD,allNA)),by=i]
    DT[(!drop)][,drop:=NULL][]
        Series Id Value1 Value2
     1:      b  5      5      2
     2:      b  6      6      3
     3:      c  7      7      4
     4:      c  8      8     NA
     5:      c  9      9      6
     6:      f  5      5      2
     7:      f  6      6      3
     8:      g  7      7      4
     9:      g  8      8      5
    10:      g  9      9      6
    11:      i  1      1      1
    12:      i  2      2      2
    13:      i  3      3      3
    14:      j  5      5      5
    15:      j  6      6      6
    16:      k  7      7      7
    17:      k  8      8      8
    18:      k  9      9      9
    
    0 讨论(0)
  • 2021-02-05 20:25

    What about using complete.cases function ?

    DT[complete.cases(DT),]
    

    It will drop the rows that have a column value with NA

    > DT[complete.cases(DT),]
        Series Id Value1 Value2
     1:      b  4      4      1
     2:      b  5      5      2
     3:      b  6      6      3
     4:      c  7      7      4
     5:      c  8      8      5
     6:      c  9      9      6
     7:      f  4      4      1
     8:      f  5      5      2
     9:      f  6      6      3
    10:      g  7      7      4
    11:      g  8      8      5
    12:      g  9      9      6
    13:      j  4      4      1
    14:      j  5      5      2
    15:      j  6      6      3
    16:      k  7      7      4
    17:      k  8      8      5
    18:      k  9      9      6
    
    0 讨论(0)
提交回复
热议问题