问题
I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer.
I try to give you a simple example (very simple, but I hope useful to explain my problem):
There are three sequences and I want to be calculate the dissimilarity matrix. The alphabet is: H (in health), I (ill at home), IH (ill at hospital), D (died)
I observe the 3 subjects for 5 observations. These are the sequences:
H – H – I – D – D
H – I – I – I – D
I – I – H – IH – IH
The substitution cost matrix is a 4x4 table (state x state). It must be symmetric? This is my logical problem: while it is possible to “transit” from states H, I or IH to state Died, the contrary is illogical.
Can I use a non-symmetric substitution cost matrix in TraMineR?
If, in my database, the substitution cost (calculated with sm = "TRATE"
, for instance) from state “I” to “D” is lower (0.5) than the substitution cost from state 'I' to 'IH' (0.6) , the OM algorithm substitute the “I” whith “D” instead of “HI”.
回答1:
it seems to me that you're looking for a custom cost matrix. It is not mandatory to use either the TRATE
or CONSTANT
method.
To create a custom matrix you'll just have to do something like this:
myscm <- matrix(c(0,1,2,
1,0,2,
2,2,0), nrow=3, ncol=3)
dist.om <- seqdist(my.seq, method="OM", sm=myscm)
where myscm
is your custom matrix
This was taken from http://lists.r-forge.r-project.org/pipermail/traminer-users/2011-July/000075.html
I believe you have two options:
1) Create a rationale for all the transitions and a full custom matrix
2) Get the transition matrix you've already generated (using seqsubm(your.seq, method = "TRATE")
) and change just the inconsistent values. That's what I've done in my last analysis.
But keep in mind the point made by Gilbert in An "asymmetric" pairwise distance matrix
回答2:
The transitions rates (estimated transition probabilities) should not be confused with the substitution costs. Substitution costs are supposed to reflect the dissimilarities between states.
The matrix of transition rates (returned by seqtrate
) is NOT symmetric.
The substitution costs used to compute distances such as the optimal matching distance, must be symmetric. Otherwise, the result would not be a distance matrix, and inputting such a non symmetric matrix to, for example, a clustering procedure would lead to unexpected results.
Deriving substitution cost from transition rates is just one over several possibilities to define substitution costs. Letting $p(i|j)$ be the probability to transit from $j$ to $i$, it consists in defining the substitution cost as
$c(i,j) = 2 - p(i|j) - p(j|i)$
来源:https://stackoverflow.com/questions/28586009/traminer-substitution-cost