How to delete a row by reference in data.table?

前端 未结 6 863
南方客
南方客 2020-11-22 16:07

My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to

         


        
6条回答
  •  情话喂你
    2020-11-22 16:45

    Here is a working function based on @vc273's answer and @Frank's feedback.

    delete <- function(DT, del.idxs) {           # pls note 'del.idxs' vs. 'keep.idxs'
      keep.idxs <- setdiff(DT[, .I], del.idxs);  # select row indexes to keep
      cols = names(DT);
      DT.subset <- data.table(DT[[1]][keep.idxs]); # this is the subsetted table
      setnames(DT.subset, cols[1]);
      for (col in cols[2:length(cols)]) {
        DT.subset[, (col) := DT[[col]][keep.idxs]];
        DT[, (col) := NULL];  # delete
      }
       return(DT.subset);
    }
    

    And example of its usage:

    dat <- delete(dat,del.idxs)   ## Pls note 'del.idxs' instead of 'keep.idxs'
    

    Where "dat" is a data.table. Removing 14k rows from 1.4M rows takes 0.25 sec on my laptop.

    > dim(dat)
    [1] 1419393      25
    > system.time(dat <- delete(dat,del.idxs))
       user  system elapsed 
       0.23    0.02    0.25 
    > dim(dat)
    [1] 1404715      25
    > 
    

    PS. Since I am new to SO, I could not add comment to @vc273's thread :-(

提交回复
热议问题