Invalid .internal.selfref in data.table

后端 未结 3 742
半阙折子戏
半阙折子戏 2020-12-31 06:59

I needed to assign a \"second\" id to group some values inside my original id. this is my sample data:

dt<-structure(list(id = c(\"aaaa\", \"         


        
相关标签:
3条回答
  • 2020-12-31 07:34

    Yes, the problem is the list. Here is a simple example:

    DT <- data.table(1:5)
    mylist1 <- list(DT,"a")
    mylist1[[1]][,id:=.I]
    #warning
    
    mylist2 <- list(data.table(1:5),"a")
    mylist2[[1]][,id:=.I]
    #no warning
    

    You should avoid copying a data.table into a list (and to be on the safe side I would avoid having a DT in a list at all). Try this:

    f1 <- function(){
      mylist <- list(res=data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
                     period = c("start", "end", "start", "end", "start", "end"),
                     date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))))
      other_results <- ""
      mylist$other_results <- other_results
      mylist
    }
    
    0 讨论(0)
  • 2020-12-31 07:39

    You could "shallow copy" while creating the list, so that 1) you don't do full memory copy (speed isn't affected) and 2) you don't get internal ref error (thanks to @mnel for this trick).

    Creating data:

    set.seed(45)
    ss <- function() {
        tt <- sample(1:10, 1e6, replace=TRUE)
    }
    tt <- replicate(100, ss(), simplify=FALSE)
    tt <- as.data.table(tt)
    

    How you should go about creating the list (shallow copy):

    system.time( {
        ll <- list(d1 = { # shallow copy here...
            data.table:::settruelength(tt, 0)
            invisible(alloc.col(tt))
        }, "a")
    })
    user  system elapsed
       0       0       0
    > system.time(tt[, bla := 2])
       user  system elapsed
      0.012   0.000   0.013
    > system.time(ll[[1]][, bla :=2 ])
       user  system elapsed
      0.008   0.000   0.010
    

    So you don't compromise in speed and you don't get a warning followed by a full copy. Hope this helps.

    0 讨论(0)
  • 2020-12-31 07:51

    "Invalid .internal.selfref detected and fixed by taking a copy..."

    No need to make a copy when assigning id2 within f2() you can add a column directly by altering:

    # From:
    
          x <- x[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
    
    # To something along the lines of:
          x$id2 <- findInterval( match( x$id, unlist(groups)), cumsum(c(0,sapply(groups, length)))+1)
    

    Then you can continue use your 'x' data.table like normal without incurring a warning.

    Also, to simply suppress the warning you can use suppressWarnings() around the f2(x[["res"]]) call.

    Even on small tables there can be substantial performance difference:

    Performance Comparison:
    Unit: milliseconds
                           expr      min       lq   median       uq      max neval
                       f.main() 2.896716 2.982045 3.034334 3.137628 7.542367   100
     suppressWarnings(f.main()) 3.005142 3.081811 3.133137 3.210126 5.363575   100
                f.main.direct() 1.279303 1.384521 1.413713 1.486853 5.684363   100
    
    0 讨论(0)
提交回复
热议问题