KMeans clustering for more than 5 million vectors

前端 未结 1 1417
遇见更好的自我
遇见更好的自我 2021-01-19 05:28

I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols. I tried out Mahout which requires linux and I am on win

相关标签:
1条回答
  • 2021-01-19 06:12

    OK, So who ever wants clustering for large scale datasets, the only way of doing so is to use Mahout. IT requires a linux platform. So I had to use virtual box, placed Ubuntu on it and then used Mahout. Its a lengthy procedure to set up Mahout, but the two links that I used are as follows.

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

    0 讨论(0)
提交回复
热议问题