Weighted sum of variables by groups with data.table

前端 未结 1 543
长发绾君心
长发绾君心 2021-01-12 16:15

I am looking for a solution to compute weighted sum of some variables by groups with data.table. I hope the example is clear enough.

require(data.table)

dt          


        
1条回答
  •  广开言路
    2021-01-12 16:40

    Final attempt (copying Roland's answer :))

    Copying @Roland's excellent answer:

    print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])
    

    still not the most efficient one: (second attempt)

    Following @Roland's comment, it's indeed faster to do the operation on all columns and then just remove the unwanted ones (as long as the operation itself is not time consuming, which is the case here).

    dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]
    

    For some reason, w seems to be not found when I don't use {}.. No idea why though.


    old (inefficient) answer:

    (Subsetting can be costly if there are too many groups)

    You can do this without using .SDcols and then removing it while providing it to lapply as follows:

    dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
    #    gr V1  V2  V3  V4
    # 1:  1 20 120 220 320
    # 2:  2 70 170 270 370
    

    .SDcols makes .SD without the w column. So, it's not possible to multiply with w as it doesn't exist within the scope of .SD environment then.

    0 讨论(0)
提交回复
热议问题