Sorry, another newbie question. I am trying to take parts of data frame based on an existing ID or index, and then create a new ID or index column based on the the difference i
library(plyr)
ddply(myDF, .(userID), transform,
sessID3 = paste(userID,
c(0, cumsum(sapply(1:(length(userID) - 1),
function(x)
ifelse((timeStamp[x + 1] - timeStamp[x]) > 30,
1, 0)))), sep = '.'),
sessID4 = paste(userID,
c(0, cumsum(sapply(1:(length(userID) - 1),
function(x)
ifelse((timeStamp[x + 1] - timeStamp[x]) > 30,
1, 0)))) + 1, sep = '.'))
Gives me:
# userID timeStamp var1 var2 varN sessID1 sessID2 sessID3 sessID4
# 1 1 1 x y N 1.0 1.1 1.0 1.1
# 2 1 3 x y N 1.0 1.1 1.0 1.1
# 3 1 6 x y N 1.0 1.1 1.0 1.1
# 4 1 40 x y N 1.1 1.2 1.1 1.2
# 5 1 42 x y N 1.1 1.2 1.1 1.2
# 6 1 43 x y N 1.1 1.2 1.1 1.2
# 7 1 47 x y N 1.1 1.2 1.1 1.2
# 8 2 5 x y N 2.0 2.1 2.0 2.1
# 9 2 8 x y N 2.0 2.1 2.0 2.1
# 10 3 2 x y N 3.0 3.1 3.0 3.1
# 11 3 5 x y N 3.0 3.1 3.0 3.1
# 12 3 38 x y N 3.1 3.2 3.1 3.2
# 13 3 39 x y N 3.1 3.2 3.1 3.2
# 14 3 39 x y N 3.1 3.2 3.1 3.2
# 15 3 82 x y N 3.2 3.3 3.2 3.3
# 16 3 83 x y N 3.2 3.3 3.2 3.3
# 17 3 90 x y N 3.2 3.3 3.2 3.3
# 18 3 91 x y N 3.2 3.3 3.2 3.3
# 19 3 102 x y N 3.2 3.3 3.2 3.3
And a "data table" way...
library(data.table)
myDT <- data.table(myDF)
setkey(myDT,userID)
myDT[,sessID3:=paste(userID,cumsum(c(0,diff(timeStamp)>30)),sep="."),by=userID]
all.equal(myDT$sessID1,as.numeric(myDT$sessID3))
# [1] TRUE
Explanation:
Using by=userID
with data table groups the rows by userID
. Using diff(timeStamp)>30
creates a logical vector with one fewer element than the number of rows in the group, so we prepend 0 with c(0,diff(timesStamp)>30). Using cumsum(c(0,diff(timeStamp>30))
coerces logical to integer and calculates the cumulative sum. Every time we encounter a diff > 30
, the cumsum increments by 1. Finally ,using paste(...)
just concatenates the userID with the secondary index.
One note: you have it set up so that the sessID
is numeric. This gets a bit dicey if there are more than 10 sessions for a given user. IMO better to use character for sessID
.