How to create a lag variable within each group?

前端 未结 5 1478
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 04:45

I have a data.table:

set.seed(1)
data <- data.table(time = c(1:3, 1:4),
                   groups = c(rep(c(\"b\", \"a\"), c(3, 4))),
                   v         


        
5条回答
  •  长发绾君心
    2020-11-22 05:13

    If you wanted to make sure that you avoided any issue with ordering the data, you can do this, using dplyr, manually with something like:

    df <- data.frame(Names = c(rep('Dan',50),rep('Dave',100)),
                Dates = c(seq(1,100,by=2),seq(1,100,by=1)),
                Values = rnorm(150,0,1))
    
    df <- df %>% group_by(Names) %>% mutate(Rank=rank(Dates),
                                        RankDown=Rank-1)
    
    df <- df %>% left_join(select(df,Rank,ValueDown=Values,Names),by=c('RankDown'='Rank','Names')
    ) %>% select(-Rank,-RankDown)
    
    head(df)
    

    Or alternatively I like the idea of putting it in a function with a chosen grouping variable(s), ranking column (like Date or otherwise), and chosen number of lags. This also requires lazyeval as well as dplyr.

    groupLag <- function(mydf,grouping,ranking,lag){
      df <- mydf
      groupL <- lapply(grouping,as.symbol)
    
      names <- c('Rank','RankDown')
      foos <- list(interp(~rank(var),var=as.name(ranking)),~Rank-lag)
    
      df <- df %>% group_by_(.dots=groupL) %>% mutate_(.dots=setNames(foos,names))
    
      selectedNames <- c('Rank','Values',grouping)
      df2 <- df %>% select_(.dots=selectedNames)
      colnames(df2) <- c('Rank','ValueDown',grouping)
    
      df <- df %>% left_join(df2,by=c('RankDown'='Rank',grouping)) %>% select(-Rank,-RankDown)
    
      return(df)
    }
    
    groupLag(df,c('Names'),c('Dates'),1)
    

提交回复
热议问题