TraMineR: Can I get the complete sequence if I give an event sub sequence?

问题

I have a sequence dataset like below:

customerid    flag  0   1   2   3   4   5   6   7   8   9   10  11
abc234          1   3   4   3   4   5   8   4   3   3   2   14  14
abc233          0   4   4   4   4   4   4   4   4   4   4   4   4
qpr81           0   9   8   7   8   8   7   8   8   7   8   8   7
qnr94           0   14  14  14  2   14  14  14  14  14  14  14  14

Values in column 0 to 11 are the sequences. There are two sets of customers with flag=1 and flag=0, I have differentiating event sequences for both sets. ( Only frequencies and residuals for 2 groups are shown here)

Subsequence Freq.0      Freq.1     Resid.0       Resid.1
(3>4)       0.19208177  0.0753386   5.540793    -21.43304
(4>5)       0.15752553  0.059960497 5.115241    -19.78691
(5>4)       0.15950556  0.062782167 5.037413    -19.48586

I want to find the customer ids and the flags for which the event sequences match.

Should I write a python script to traverse the transactions or is there some direct method in R to do this?

CODE
--------------

library(TraMineR)

custid=c(a1,a2,a3,b4,b5,c6,c7,d8,d9)#sample customer ids
flag=c(0,0,0,1,0,1,1,0,1)#flag
col1=c(14,14,14,14,14,5,14,14,2)
col2=c(14,14,3,14,3,14,6,3,3)
col3=c(14,2,2,14,2,14,2,2,2)
col4=c(14,2,2,14,2,14,2,2,14)
df=data.frame(custid,flag,col1,col2,col3,col4)#dataframe generation
print(df)
#Defining sequence from col1 to col4
df.s<-seqdef(df,3:6)
print(df.s)
#finding the transitions
transition<-seqetm(df.s,method='transition')
print(transition)
#converting to TSE format
df.tse=seqformat(df.s,from='SPS',to='TSE',tevent = transition)
print(df.tse)
#Event sequence generation
df.seqe=seqecreate(id=df.tse$id,timestamp=df.tse$time,event=df.tse$event)
print(df.seqe)
#subsequences
fsubseq <- seqefsub(df.seqe, pMinSupport = 0.01)
print(fsubseq)
groups <- factor(df$flag>0,labels=c(1,0))
#finding differentiating event sequences based on flag using ChiSquare test
diff <- seqecmpgroup(fsubseq, group = df$flag, method = "chisq")

#Using seqeapplysub for finding the presence of subsequences?
presence=seqeapplysub(fsubseq,method="presence")
print(presence[1:3,3:1])

Thanks

回答1:

From what I understand, you have state sequences and have transformed them into event sequences using the seqecreate function of TraMineR. The events you are considering are the state changes. Thus (3>4) stands for a subsequence with only one event, namely the event 3>4 (switching from 3 to 4). Then, you identify the event subsequences that best discriminate your two flags using the seqefsub and seqecmpgroup functions.

If this is correct, then you can identify the sequences containing each subsequence with the seqeapplysub function. I cannot illustrate here because you do not provide any code in your question. Look at the online help of the seqeapplysub function.

======= update referring to your added code =======

Here is how you get the ids of the sequences that contain the most discriminating subsequence.

First we extract the first three most discriminating sequences from your diff object. Second, we compute the presence matrix that provides a column for each extracted subsequence with a 1 in regard of the sequences that contain the subsequence and 0 otherwise.

diffseq <- seqefsub(df.seqe, strsubseq = paste(diff$subseq[1:3]))
(presence=seqeapplysub(diffseq, method="presence"))

Now you get the ids for the first subsequence with

custid[presence[,1]==1]

For the second it would be custid[presence[,2]==1] etc.

Likewise you get the flag with

flag[presence[,1]==1]

Hope this helps.

来源：https://stackoverflow.com/questions/40124187/traminer-can-i-get-the-complete-sequence-if-i-give-an-event-sub-sequence

标签

traminer