Obtain the Clustered Documents of DBSCAN

青春壹個敷衍的年華 提交于 2019-12-13 11:24:11

问题


I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.

However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.

The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.

To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?

Thank you very much!


回答1:


Use the labels to select documents.

X[labels_ == 1,:]

Should be all documents in cluster 1.



来源:https://stackoverflow.com/questions/50823999/obtain-the-clustered-documents-of-dbscan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!