I want to add ID variable in this data. If receipt_ids are sequenced numbers, then those have same IDs.
CUST_NO_ID receipt_id dollar
12 29
Assuming your data frame has already been sorted by CUST_NO_ID
and receipt_id
, you can use cumsum
on a conditional vector where TRUE indicates the position a new ID should be created:
df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))
df
# CUST_NO_ID receipt_id dollar ID
#1 12 29 20.84 1
#2 12 30 20.21 1
#3 12 86 24.50 2
#4 12 87 20.68 2
#5 12 108 25.79 3
#6 12 109 24.93 3
#7 12 125 20.63 4
#8 12 126 9.90 4
#9 19 193 69.48 5
#10 19 194 46.88 5
Had a similar notion to @Psidom, but he beat me to the punch with cumsum
. Here's a dplyr
solution. Adding in group_by
can give you added flexibility if you want to restart ids by customer number.
df %>%
mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))
We can use data.table
library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]
Or use the shift
setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]
This does it
id <- 1
for(row in 1:nrow(data)){
if(row == 1){
dif <- 1
}else{
dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
}
if(dif != 1){
id = id + 1
}
data[row,'ID'] = id
}