I have a dataframe times
that looks like this:
user time
A 7/7/2010
B 7/12/2010
C 7/12/2010
A 7/12/2010
C 7/
Based on the dplyr
solution by eipi10 and the suggestion of nrussell, I've written the following solution using data.table
.
First you need to format the variable times
:
times$time = as.Date(times$time, "%m/%d/%Y")
Then you'll need to convert times
to a data.table using:
library(data.table)
times <- as.data.table(times)
Overwriting times
was useful for my purposes but you may want to instantiate a new variable. After formatting your dataframe as a data.table just do:
new.times <- times[,
.(first = min(time),
last = max(time),
n = .N,
meandiff = mean(diff(time)),
mindiff = min(diff(time)),
numdiffuniq = length(unique(diff(time))),
by='user')]
Running on a linux virtual machine with 128G RAM and using a sample of 1000 entires, the elapsed runtime was 0.43s.
See this tutorial for more on data.table.