How to intrepret Clusters results after using Doc2vec?

前端 未结 3 1937
时光说笑
时光说笑 2021-01-29 03:13

I am using doc2vec to convert the top 100 tweets of my followers in vector representation (say v1.....v100). After that I am using the vector representation to do the K-Means cl

相关标签:
3条回答
  • 2021-01-29 03:35

    These values represent the coordinates of the individual tweets (or documents) that you want to represent in a cluster. I am assuming that v1 to v100 represent the vectors for tweets 1 to 100, otherwise this won't make sense.So if suppose cluster 0 has v1,v5 and v6, this means that tweets 1, 5 and 6 with vector representation v1,v5 and v6 respectively (or the tweets with vectors v1, v5 and v6 as their representation) belong to the cluster 0.

    0 讨论(0)
  • 2021-01-29 03:36

    The clusters themselves does not mean anything specific. You can have as many clusters as you want and all the clustering algorithm would do is try to distribute all your vectors among these clusters. If you are aware of all the tweets and know how many different topics you want them to be separated in, try to clean them or have features in them such that the clustering algorithm can use those to segregate them in the clusters of your choice.

    Also if you meant topic modeling, that is different from clustering and you should also look that up.

    0 讨论(0)
  • 2021-01-29 03:42

    Don't use the individual variables. They should be only analyzed together because of the way these embeddings are trained.

    For a starter, find

    1. The most similar document vectors to your centroid to see typical cluster members
    2. The most similar term vectors from the embedding for typical words to describe the cluster
    3. Note the distances to see how good your fit is.
    0 讨论(0)
提交回复
热议问题