问题
Is it possible to load a list of frequent subsequences from a .txt file, and make TraMineR recognize it as a sequence object?
Unfortunately I don't have the raw data, therefore I am not able to recreate the analysis. The only file what I have is a .txt file containing the frequent subsequences. I assume it was created with the seqefsub()
function from the TraMineR package, with maxGap=2
, because the data looks like as an output of the mentioned function.
read.table()
reads it as a data frame but as far as I understood, TraMineR handles event sequences as lists with many additional attributes, that for example are not contained in this file. Or I don't know how to extract them...
This is how the a couple of lines from the .txt file look like:
Subsequence Support Count
16 (WT4)-(WT3) 0.76666667 805
17 (WL2) 0.76380952 802
18 (S1) 0.76000000 798
19 (FRF,WL2) 0.74380952 781
20 (WT2)-(WT1) 0.70571429 741
回答1:
To create an event sequence object from the (text) subsequences, you have to transform them into vertical time stamped event (TSE) form. The function below does the job for your data
## Function subseq.to.TSE
## puts the sequences into TSE format using
## position as timestamp
## sdf: a data frame with columns Id, Subsequence, Support and Count.
subseq.to.TSE <- function(sdf){
tse <- data.frame(id=0, event="", time=0)
k <- 0
for (i in 1:nrow(sdf)){
id <- sdf[i,"Id"]
s <- sdf[i,"Subsequence"]
ss <- gsub("\\(","",s)
ss <- gsub("\\)","",ss)
# split transitions
st <- strsplit(ss, split="-")[[1]]
for (j in 1:length(st)){
stt <- strsplit(st[j], split=",")[[1]]
for(jj in 1:length(stt)){
k <- k+1
tse[k,1] <- id
## parsing for simultaneous events
if (!(stt[jj] %in% levels(tse[,2])))
{levels(tse[,2]) <- c(levels(tse[,2]),stt[jj])}
tse[k,2] <- stt[jj]
tse[k,3] <- j
}
}
}
return(tse)
}
Here is how you would use it on the example data.
We first create the data frame that we name s.df
s.df <- data.frame(scan(what=list(Id=0, Subsequence="", Support=double(), Count=0)))
16 (WT4)-(WT3) 0.76666667 805
17 (WL2) 0.76380952 802
18 (S1) 0.76000000 798
19 (FRF,WL2) 0.74380952 781
20 (WT2)-(WT1) 0.70571429 741
# leave a blank line to end the scan
Then we extract the TSE data from s.df
and create from it the event sequence object using seqecreate
. Finally, we assign the counts as sequence weights.
s.tse <- subseq.to.TSE(s.df)
seqe <- seqecreate(s.tse)
seqeweight(seqe) <- s.df[,"Count"]
Now you can for instance plot the event sequences with
seqpcplot(seqe)
来源:https://stackoverflow.com/questions/29588200/load-frequent-subsequences-from-txt