How to cluster similar sentences using BERT

前端 未结 4 2060
难免孤独
难免孤独 2021-02-05 19:18

For ElMo, FastText and Word2Vec, I\'m averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.

A good example of t

4条回答
  •  情话喂你
    2021-02-05 20:17

    Not sure if you still need it but recently a paper mentioned how to use document embeddings to cluster documents and extract words from each cluster to represent a topic. Here's the link: https://arxiv.org/pdf/2008.09470.pdf, https://github.com/ddangelov/Top2Vec

    Inspired by the above paper, another algorithm for topic modelling using BERT to generate sentence embeddings is mentioned here: https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6, https://github.com/MaartenGr/BERTopic

    The above two libraries provide an end-to-end solution to extract topics from a corpus. But if you're interested only in generating sentence embeddings, look at Gensim's doc2vec (https://radimrehurek.com/gensim/models/doc2vec.html) or at sentence-transformers (https://github.com/UKPLab/sentence-transformers) as mentioned in the other answers. If you go with sentence-transformers, it is suggested that you train a model on you're domain specific corpus to get good results.

提交回复
热议问题