发表新帖

发表新帖

How to cluster similar sentences using BERT

前端未结

关注

 4  2060

难免孤独 2021-02-05 19:18

For ElMo, FastText and Word2Vec, I\'m averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.

A good example of t

4条回答

情话喂你 (楼主)

2021-02-05 20:17

Not sure if you still need it but recently a paper mentioned how to use document embeddings to cluster documents and extract words from each cluster to represent a topic. Here's the link: https://arxiv.org/pdf/2008.09470.pdf, https://github.com/ddangelov/Top2Vec

Inspired by the above paper, another algorithm for topic modelling using BERT to generate sentence embeddings is mentioned here: https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6, https://github.com/MaartenGr/BERTopic

The above two libraries provide an end-to-end solution to extract topics from a corpus. But if you're interested only in generating sentence embeddings, look at Gensim's doc2vec (https://radimrehurek.com/gensim/models/doc2vec.html) or at sentence-transformers (https://github.com/UKPLab/sentence-transformers) as mentioned in the other answers. If you go with sentence-transformers, it is suggested that you train a model on you're domain specific corpus to get good results.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题