My question is related to assignment by reference versus copying in data.table
. I want to know if one can delete rows by reference, similar to
Here is a working function based on @vc273's answer and @Frank's feedback.
delete <- function(DT, del.idxs) { # pls note 'del.idxs' vs. 'keep.idxs'
keep.idxs <- setdiff(DT[, .I], del.idxs); # select row indexes to keep
cols = names(DT);
DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
setnames(DT.subset, cols[1]);
for (col in cols[2:length(cols)]) {
DT.subset[, (col) := DT[[col]][keep.idxs]];
DT[, (col) := NULL]; # delete
}
return(DT.subset);
}
And example of its usage:
dat <- delete(dat,del.idxs) ## Pls note 'del.idxs' instead of 'keep.idxs'
Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.
> dim(dat)
[1] 1419393 25
> system.time(dat <- delete(dat,del.idxs))
user system elapsed
0.23 0.02 0.25
> dim(dat)
[1] 1404715 25
>
PS. Since I am new to SO, I could not add comment to @vc273's thread :-(