Why is using update on a lm inside a grouped data.table losing its model data?

后端 未结 1 1192
悲&欢浪女
悲&欢浪女 2020-11-27 06:57

Ok, this is a weird one. I suspect this is a bug inside data.table, but it would be useful if anyone can explain why this is happening - what is update

相关标签:
1条回答
  • 2020-11-27 07:35

    This is not an answer, but is too long for a comment

    The .Environment for the terms component is identical for each resulting model

    e1 <- attr(fit[['V1']][[1]]$terms, '.Environment')
    e2 <- attr(fit[['V1']][[2]]$terms, '.Environment')
    e3 <- attr(fit[['V1']][[3]]$terms, '.Environment')
    identical(e1,e2)
    ## TRUE
    identical(e2, e3)
    ## TRUE
    

    It appears that data.table is using the same bit of memory (my non-technical term) for each evaluation of j by group (which is efficient). However when update is called, it is using this to refit the model. This will contain the values from the last group.

    So, if you fudge this, it will work

    fit = DT[, { xx <-list2env(copy(.SD))
    
                 mymodel <-lm(Sepal.Length ~ Sepal.Width + Petal.Length)
                 attr(mymodel$terms, '.Environment') <- xx
                 list(list(mymodel))}, by= 'Species']
    
    
    
    
    
    lfit2 <- fit[, list(list(update(V1[[1]], ~.-Sepal.Width))), by = Species]
    lfit2[,lapply(V1,nobs)]
    V1 V2 V3
    1: 41 39 42
    # using your exact diagnostic coding.
    lfit2[,nobs(V1[[1]]),by = Species]
          Species V1
    1:     setosa 41
    2: versicolor 39
    3:  virginica 42
    

    not a long term solution, but at least a workaround.

    0 讨论(0)
提交回复
热议问题