My dataframe looks like this. The two rightmost columns are my desired columns.
**Name ActivityType ActivityDate Email(last 21 says) Webinar(last
Here is another option with base R
:
df
is first split according to Name
and then, among each subset, for each Sale
, it looks if there is an Email (Webinar) within 21 days from the Sale. Finally, the list is unsplit according to Name
.
You just have to replace FALSE
by no
and TRUE
by yes
afterwards.
df_split <- split(df, df$Name)
df_split <- lapply(df_split, function(tab){
i_s <- which(tab[,2]=="Sale")
tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)})
tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)})
tab
})
df_res <- unsplit(df_split, df$Name)
df_res
# Name ActivityType ActivityDate Email21 Webinar21
#1 John Email 2014-01-01 NA NA
#2 John Webinar 2014-01-05 NA NA
#3 John Sale 2014-01-20 TRUE TRUE
#4 John Webinar 2014-03-25 NA NA
#5 John Sale 2014-04-01 FALSE TRUE
#6 John Sale 2014-07-01 FALSE FALSE
#7 Tom Email 2015-01-01 NA NA
#8 Tom Webinar 2015-01-05 NA NA
#9 Tom Sale 2015-01-20 TRUE TRUE
#10 Tom Webinar 2015-03-25 NA NA
#11 Tom Sale 2015-04-01 FALSE TRUE
#12 Tom Sale 2015-07-01 FALSE FALSE
data
df <- structure(list(Name = c("John", "John", "John", "John", "John",
"John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email",
"Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar",
"Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071,
16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519,
16526, 16617), class = "Date")), .Names = c("Name", "ActivityType",
"ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L,
7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame")
Here's a possible data.table
solution. Here I'm creating 2 temporary data sets- one for Sale
and one for the rest of activity types and then joining between them by a rolling window of 21 while using by = .EACHI
in order to check conditions in each join. Then, I'm joining the result to the original data set.
Convert the date column to Date
class and key the data by Name and Date (for the final/rolling join)
library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)
Create 2 temporary data sets per each activity
Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]
Join by a rolling window of 21 to the sales temporary data set while checking conditions
Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")),
Webinar21 = as.logical(which(i.ActivityType == "Webinar"))),
roll = -21, by = .EACHI]
Join everything back
df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
# Name ActivityType ActivityDate Email21 Webinar21
# 1: John Email 2014-01-01 NA NA
# 2: John Webinar 2014-01-05 NA NA
# 3: John Sale 2014-01-20 TRUE TRUE
# 4: John Webinar 2014-03-25 NA NA
# 5: John Sale 2014-04-01 NA TRUE
# 6: John Sale 2014-07-01 NA NA
# 7: Tom Email 2015-01-01 NA NA
# 8: Tom Webinar 2015-01-05 NA NA
# 9: Tom Sale 2015-01-20 TRUE TRUE
# 10: Tom Webinar 2015-03-25 NA NA
# 11: Tom Sale 2015-04-01 NA TRUE
# 12: Tom Sale 2015-07-01 NA NA