merging endpoints of a range with a sequence

前端 未结 2 1996
余生分开走
余生分开走 2021-02-10 00:37

In one of my application there is a piece of code that retrieve information from a data.table object depending on values in another.

# say this tabl         


        
2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-10 00:46

    This is a great spot to use the roll argument of data.table:

    setkey(dt1, id, date)
    setkey(dt, id, start)
    
    dt[dt1, roll = TRUE][end >= start,
       list(start = start[1], end = end[1], result = mean(var)), by = id]
    
    # benchmark
    microbenchmark(OP    = adply(dt, 1, myfunc),
                   Frank = dt[dt1[as.list(dt[,seq.Date(start,end,"day"),by="id"])][,mean(var),by=id]],
                   eddi  = dt[dt1, roll = TRUE][end >= start,list(start = start[1], end = end[1], result = mean(var)), by = id])
    #Unit: milliseconds
    #  expr       min        lq    median        uq       max neval
    #    OP 24.436126 29.184786 30.853094 32.493521 50.898664   100
    # Frank  9.115676 11.303691 12.081000 13.122753 28.370415   100
    #  eddi  5.336315  6.323643  6.771898  7.497285  9.531376   100
    

    The time difference will become much more dramatic as the size of the datasets grows.

提交回复
热议问题