I needed to assign a \"second\" id to group some values inside my original id
. this is my sample data:
dt<-structure(list(id = c(\"aaaa\", \"
Yes, the problem is the list. Here is a simple example:
DT <- data.table(1:5)
mylist1 <- list(DT,"a")
mylist1[[1]][,id:=.I]
#warning
mylist2 <- list(data.table(1:5),"a")
mylist2[[1]][,id:=.I]
#no warning
You should avoid copying a data.table into a list (and to be on the safe side I would avoid having a DT in a list at all). Try this:
f1 <- function(){
mylist <- list(res=data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))))
other_results <- ""
mylist$other_results <- other_results
mylist
}
You could "shallow copy" while creating the list, so that 1) you don't do full memory copy (speed isn't affected) and 2) you don't get internal ref error (thanks to @mnel for this trick).
set.seed(45)
ss <- function() {
tt <- sample(1:10, 1e6, replace=TRUE)
}
tt <- replicate(100, ss(), simplify=FALSE)
tt <- as.data.table(tt)
system.time( {
ll <- list(d1 = { # shallow copy here...
data.table:::settruelength(tt, 0)
invisible(alloc.col(tt))
}, "a")
})
user system elapsed
0 0 0
> system.time(tt[, bla := 2])
user system elapsed
0.012 0.000 0.013
> system.time(ll[[1]][, bla :=2 ])
user system elapsed
0.008 0.000 0.010
So you don't compromise in speed and you don't get a warning followed by a full copy. Hope this helps.
"Invalid .internal.selfref detected and fixed by taking a copy..."
No need to make a copy when assigning id2 within f2() you can add a column directly by altering:
# From:
x <- x[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
# To something along the lines of:
x$id2 <- findInterval( match( x$id, unlist(groups)), cumsum(c(0,sapply(groups, length)))+1)
Then you can continue use your 'x' data.table like normal without incurring a warning.
Also, to simply suppress the warning you can use suppressWarnings() around the f2(x[["res"]])
call.
Even on small tables there can be substantial performance difference:
Performance Comparison:
Unit: milliseconds
expr min lq median uq max neval
f.main() 2.896716 2.982045 3.034334 3.137628 7.542367 100
suppressWarnings(f.main()) 3.005142 3.081811 3.133137 3.210126 5.363575 100
f.main.direct() 1.279303 1.384521 1.413713 1.486853 5.684363 100