I have a data frame with multiple time series identified by uniquer id\'s. I would like to remove any time series that have only 0 values.
The data frame looks as fo
If dat
is a data.table
, then this is easy to write and read :
dat[,.SD[any(value!=0)],by=id]
.SD
stands for Subset of Data. This answer explains .SD
very well.
Picking up on Gabor's nice use of ave
, but without repeating the same variable name (DF
) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :
dat[ ave(value!=0,id,FUN=any) ]
The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat
.
An easy plyr
solution would be
ddply(mydat,"id",function(x) if (all(x$value==0)) NULL else x)
(seems to work OK) but there may be a faster solution with data.table
...
Try this. No packages are used.
DF[ ave(DF$value != 0, DF$id, FUN = any), ]