How to cluster similar sentences using BERT

前端 未结 4 2059
难免孤独
难免孤独 2021-02-05 19:18

For ElMo, FastText and Word2Vec, I\'m averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.

A good example of t

4条回答
  •  北恋
    北恋 (楼主)
    2021-02-05 20:06

    You can use Sentence Transformers to generate the sentence embeddings. These embeddings are much more meaningful as compared to the one obtained from bert-as-service, as they have been fine-tuned such that semantically similar sentences have higher similarity score. You can use FAISS based clustering algorithm if number of sentences to be clustered are in millions or more as vanilla K-means like clustering algorithm takes quadratic time.

提交回复
热议问题