问题
I am looking for help writing a function that can identify a trend ("positive/negative/mixed", see definition below) in a value for a given customer in a dataset.
I have the following transactional data; all customers have between 3-13 transactions each.
customer_ID transaction_num sales
Josh 1 $35
Josh 2 $50
Josh 3 $65
Ray 1 $65
Ray 2 $52
Ray 3 $49
Ray 4 $15
Eric 1 $10
Eric 2 $13
Eric 3 $9
I would like to write a function in R that populates a new dataframe as follows
Customer_ID Sales_Slope
Josh Positive
Ray Negative
Eric Mixed
where:
Josh's slope is positive because all of his transaction sales costs continue to increase with each additional shopping point
Ray's slope is negative because all of his transactions sales costs continue to decrease with each additional shopping point
Eric's slope is mixed because all of his transaction sales costs fluctate... with no clear trend...
I have tried quite extensively to do this myself but am stuck.. here is some pseudo-code I have been able to put together
counter = max(transaction_num)
while counter >= 0
if sales at max transaction_num are greater than sales at max transaction_num - 1)
then counter = counter - 1 ; else "not positive slope trend"
回答1:
I think I would start with something like this. data.table
is usually pretty efficient with bigger datasets.
#Make fake data
require("data.table")
data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
sales=c(35,50,65,65,52,49,15,10,13,9))
data[,transaction_num:=seq(1,.N),by=c("customer_ID")]
Now for the actual code.
data <- data.table(data)
#Calculate difference in rolling two time periods
rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
sales[transaction_num+1]-sales[transaction_num])),
by=c("customer_ID")]
#Sum up positive and negative values
rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]
#Make Sales Slope variable
rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]
#Make final table
final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
final.table
# customer_ID Sales_Slope
# 1: Josh Positive
# 2: Ray Negative
# 3: Eric Mixed
#You can always merge this result back onto your main dataset if you want
data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)
回答2:
The simple answer is to use diff
. It just subtracts the current value from the next, so if all of diff(x)
is above zero, it is increasing, and vice-versa. First, read the data:
# Read in some data.
data<-read.table(textConnection('customer_ID transaction_num sales
Josh 1 $35
Josh 2 $50
Josh 3 $65
Ray 1 $65
Ray 2 $52
Ray 3 $49
Ray 4 $15
Eric 1 $10
Eric 2 $13
Eric 3 $9'),header=TRUE,stringsAsFactors=FALSE)
data$sales<-as.numeric(sub('\\$','',data$sales))
Now the code:
# Diff subtracts next value from current in a diff.
# so diff(c(1,2,3,4)) is c(1,1,1)
direction<-function(x){
if(all(diff(x)>0)) return('Increasing')
if(all(diff(x)<0)) return('Decreasing')
return('Mixed')
}
# If you want a vector.
c(by(data$sales,data$customer_ID,direction))
# Eric Josh Ray
# "Mixed" "Increasing" "Decreasing"
# If you want to a little data frame.
aggregate(sales~customer_ID,data,direction)
# customer_ID sales
# 1 Eric Mixed
# 2 Josh Increasing
# 3 Ray Decreasing
来源:https://stackoverflow.com/questions/23600385/how-to-determine-trend-of-time-series-of-values-in-r