dplyr: lead() and lag() wrong when used with group_by()

后端 未结 3 1711
旧巷少年郎
旧巷少年郎 2020-12-02 09:46

I want to find the lead() and lag() element in each group, but had some wrong results.

For example, data is like this:

library(dplyr)
df = data.frame         


        
相关标签:
3条回答
  • 2020-12-02 10:18

    It may happen that stats::lag is used instead (e.g. when restoring environments with the session package). This can easly slip through unnoticed as it it won't throw an error when being used as in the question. Double-check by simply typing lag, use conflicted package, or disambiguate the function call by calling dplyr::lag instead.

    0 讨论(0)
  • 2020-12-02 10:29

    It seems you have to pass additional argument to lag and lead functions. When I run your function without arrange, but with order_by added, everything seems to be ok.

    df %>%
    group_by(name) %>%
    mutate(next.score = lead(score, order_by=name),
    before.score = lag(score, order_by=name))
    

    Output:

      name score next.score before.score
    1   Al   100         60           NA
    2  Jen    80        100           NA
    3   Al    60         80          100
    4  Jen   100         60           80
    5   Al    80         NA           60
    6  Jen    60         NA          100
    

    My sessionInfo():

    R version 3.1.1 (2014-07-10)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    
    locale:
    [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250        LC_MONETARY=Polish_Poland.1250
    [4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] dplyr_0.4.1
    
    loaded via a namespace (and not attached):
    [1] assertthat_0.1  DBI_0.3.1       lazyeval_0.1.10 magrittr_1.5                parallel_3.1.1  Rcpp_0.11.5    
    [7] tools_3.1.1 
    
    0 讨论(0)
  • 2020-12-02 10:34

    Using order_by is good when you have only one grouping variable. In case of multiple grouping variable, I could not find any solution except, writing and reading the table to get rid of grouping variables. It worked pretty well for me, but its efficiency depends on the size of table.

    0 讨论(0)
提交回复
热议问题