I have a large R data.table
with a multi column key, where some value columns contain some NA. I\'d like to remove groups that are entirely NA in one or more value
You can do this to get those entries where not ALL Value
are NA
:
setkey(DT, "Series")
DT[, .SD[(!all(is.na(Value)))], by=Series]
The parens around !all
are needed to avoid not-join syntax which Matthew will look into (see comments). Same as this :
DT[, .SD[as.logical(!all(is.na(Value)))], by=Series]
Building on that to answer the new clarified question :
allNA = function(x) all(is.na(x)) # define helper function
for (i in c("Id","Series"))
DT = DT[, if (!any(sapply(.SD,allNA))) .SD else NULL, by=i]
DT
Series Id Value1 Value2
1: i 1 1 1
2: i 2 2 2
3: i 3 3 3
4: b 5 5 2
5: b 6 6 3
6: f 5 5 2
7: f 6 6 3
8: j 5 5 5
9: j 6 6 6
10: c 7 7 4
11: c 8 8 NA
12: c 9 9 6
13: g 7 7 4
14: g 8 8 5
15: g 9 9 6
16: k 7 7 7
17: k 8 8 8
18: k 9 9 9
That changes the order, though. So isn't precisely the result requested. The following keeps the order and should be faster too.
# starting fresh from original DT in question again
DT[,drop:=FALSE]
for (i in c("Series","Id"))
DT[,drop:=drop|any(sapply(.SD,allNA)),by=i]
DT[(!drop)][,drop:=NULL][]
Series Id Value1 Value2
1: b 5 5 2
2: b 6 6 3
3: c 7 7 4
4: c 8 8 NA
5: c 9 9 6
6: f 5 5 2
7: f 6 6 3
8: g 7 7 4
9: g 8 8 5
10: g 9 9 6
11: i 1 1 1
12: i 2 2 2
13: i 3 3 3
14: j 5 5 5
15: j 6 6 6
16: k 7 7 7
17: k 8 8 8
18: k 9 9 9
What about using complete.cases
function ?
DT[complete.cases(DT),]
It will drop the rows that have a column value with NA
> DT[complete.cases(DT),]
Series Id Value1 Value2
1: b 4 4 1
2: b 5 5 2
3: b 6 6 3
4: c 7 7 4
5: c 8 8 5
6: c 9 9 6
7: f 4 4 1
8: f 5 5 2
9: f 6 6 3
10: g 7 7 4
11: g 8 8 5
12: g 9 9 6
13: j 4 4 1
14: j 5 5 2
15: j 6 6 3
16: k 7 7 4
17: k 8 8 5
18: k 9 9 6