问题
I did an analysis using TraMineR in order to measure the similarity among sequences of spatial use (for example Rural(R) vs Urban (U): sequence example -> RRRRRUUURRUUU) A requirement in my analysis is that states are compared at the same moment in time and therefore I used the hamming sequence similarity. Based on the similarity matrix I created a dendrogram, giving the distances among individual sequences, helping to identify "behavioral similarities" in sequential spatial use. Now I am looking for a way to calculate the robustness or reliability of the tree. Does somebody have an idea how I can calculate a bootstrap tree (with bootstrap values indicated along the branches)?
Kind regards,
Johannes
回答1:
The fpc
package has a function called clusterboot
that can be used to assess the stability of a clustering procedure. It can be used in the following way:
library(TraMineR)
data(mvad)
##Use some sequence data to illustrate
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school", "training")
mvad.labels <- c("employment", "further education", "higher education", "joblessness", "school", "training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states = mvad.scodes, labels = mvad.labels, xtstep = 6)
## Compute Hamming distances
ham <- seqdist(mvad.seq, method="HAM")
library(fpc)
cf2 <- clusterboot(as.dist(ham),clustermethod=disthclustCBI, k=5, cut="number", method="average")
print(cf2)
The clusterboot
help page provides the following guidelines to interpret the values.
There is some theoretical justification to consider a Jaccard similarity value smaller or equal to 0.5 as an indication of a "dissolved cluster", see Hennig (2008). Generally, a valid, stable cluster should yield a mean Jaccard similarity value of 0.75 or more. Between 0.6 and 0.75, clusters may be considered as indicating patterns in the data, but which points exactly should belong to these clusters is highly doubtful. Below average Jaccard values of 0.6, clusters should not be trusted. "Highly stable" clusters should yield average Jaccard similarities of 0.85 and above.
Having a stable clustering procedure do not implies that the clustering is good. You may also be interested in cluster quality measure. In that case, you can use the WeightedCluster
package, see here: http://mephisto.unige.ch/weightedcluster/
来源:https://stackoverflow.com/questions/26137925/measuring-reliability-of-tree-dendrogram-traminer