How to use dplyr as alternative to aggregate

前端 未结 2 1113
Happy的楠姐
Happy的楠姐 2021-01-24 11:15

I have a dataframe times that looks like this:

user     time
A        7/7/2010
B        7/12/2010
C        7/12/2010
A        7/12/2010 
C        7/         


        
2条回答
  •  清歌不尽
    2021-01-24 12:01

    Based on the dplyr solution by eipi10 and the suggestion of nrussell, I've written the following solution using data.table.

    First you need to format the variable times:

    times$time = as.Date(times$time, "%m/%d/%Y")
    

    Then you'll need to convert times to a data.table using:

    library(data.table)
    times <- as.data.table(times)
    

    Overwriting times was useful for my purposes but you may want to instantiate a new variable. After formatting your dataframe as a data.table just do:

    new.times <- times[, 
                        .(first = min(time),
                          last = max(time),
                          n = .N,
                          meandiff = mean(diff(time)),
                          mindiff = min(diff(time)),
                          numdiffuniq = length(unique(diff(time))),
                          by='user')]
    

    Running on a linux virtual machine with 128G RAM and using a sample of 1000 entires, the elapsed runtime was 0.43s.

    See this tutorial for more on data.table.

提交回复
热议问题