cluster one-dimensional data using pvclust

自古美人都是妖i 提交于 2019-12-11 09:14:17

问题


Thanks for taking time read this question. I have some one-dimensional data to cluster in R. The basic hclust command works fine. But the pvclust command, however, does not take one-dimensional data, and keeps saying:

Error in hclust(distance, method = method.hclust) : 
  must have n >= 2 objects to cluster

I found a work-around, that I added some all-zero rows to the data. So the data becomes:

       [,1]   [,2]   [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
[1,]  7.424 14.251 15.957 1.542 2.451 20.836 13.534 20.003 12.555 10.817
[2,]      0      0      0     0     0      0      0      0      0      0
[3,]      0      0      0     0     0      0      0      0      0      0
[4,]      0      0      0     0     0      0      0      0      0      0

Then I ran pvclust, and it worked!

But I am concerned that this work-around screws up the mathematics laying behind pvclust. Can any one tell me whether I am right/wrong, and if there's a better solution to my question?

Thank you!


回答1:


First of all, let me state that none of these methods is meant for one-dimensional data.

For one-dimensional data, please use a method that exploits that the data can be sorted. For example, use a method based on kernel density estimation.

The term "cluster analysis" is usually used with multidimensional data only. In one dimensional, there are much better methods. See also "natural breaks optimization", but IMHO you should be using kernel density estimation: split the data at local minima in the KDE.

Now to your actual question. Most likely the problem is that you are ... passing 1 dimensional data. Which is interpreted as one record, with d dimensions, and thus the method complains about having a single sample only. You may have success by first transposing your record.

With your hack of adding zero records, the result most likely becomes bogus. You are probably clustering a data set that has 1 vector that contains your data, and 3 vectors that are all zero...

But in the end, you should not be using these methods here anyway! Use a method that exploits that your data can be sorted.



来源:https://stackoverflow.com/questions/16659242/cluster-one-dimensional-data-using-pvclust

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!