debugging: function to create multiple lags for multiple columns (dplyr)

后端 未结 1 402
甜味超标
甜味超标 2021-01-18 06:04

I want to create multiple lags of multiple variables, so I thought writing a function would be helpful. My code throws a warning (\"Truncating vector to length 1 \") and fal

1条回答
  •  旧巷少年郎
    2021-01-18 06:33

    We can use shift from data.table which can take multiple values for 'n'

    library(data.table)
    setDT(df)[order(time), c("a", "b", "c") := shift(x, 1:3) , id][order(id, time)]
    

    Suppose, we need to do this on multiple columns

    df$y <- df$x
    setDT(df)[order(time), paste0(rep(c("x", "y"), each =3), 
                    c("a", "b", "c")) :=shift(.SD, 1:3), id, .SDcols = x:y]
    

    The shift can also be used in the dplyr

    library(dplyr)
    df %>% 
      group_by(id) %>% 
      arrange(id, time) %>% 
      do(data.frame(., setNames(shift(.$x, 1:3), c("a", "b", "c"))))
    #    id  time     x     a     b     c
    #        
    #1      1  2000     1    NA    NA    NA
    #2      1  2001     2     1    NA    NA
    #3      1  2002     3     2     1    NA
    #4      1  2003     4     3     2     1
    #5      1  2004     5     4     3     2
    #6      1  2005     6     5     4     3
    #7      1  2006     7     6     5     4
    #8      1  2007     8     7     6     5
    #9      1  2008     9     8     7     6
    #10     1  2009    10     9     8     7
    #11     2  2000    10    NA    NA    NA
    #12     2  2001    11    10    NA    NA
    #13     2  2002    12    11    10    NA
    #14     2  2003    13    12    11    10
    #15     2  2004    14    13    12    11
    #16     2  2005    15    14    13    12
    #17     2  2006    16    15    14    13
    #18     2  2007    17    16    15    14
    #19     2  2008    18    17    16    15
    #20     2  2009    19    18    17    16
    

    0 讨论(0)
提交回复
热议问题