发表新帖

发表新帖

How to cluster similar sentences using BERT

前端未结

关注

 4  2061

难免孤独 2021-02-05 19:18

For ElMo, FastText and Word2Vec, I\'m averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.

A good example of t

4条回答

梦如初夏 (楼主)

2021-02-05 20:20
You will need to generate bert embeddidngs for the sentences first. bert-as-service provides a very easy way to generate embeddings for sentences.

This is how you can geberate bert vectors for a list of sentences you need to cluster. It is explained very well in the bert-as-service repository: https://github.com/hanxiao/bert-as-service

Installations:
```
pip install bert-serving-server  # server
pip install bert-serving-client  # client, independent of `bert-serving-server`
```
Download one of the pre-trained models available at https://github.com/google-research/bert

Start the service:
```
bert-serving-start -model_dir /your_model_directory/ -num_worker=4 
```
Generate the vectors for the list of sentences:
```
from bert_serving.client import BertClient
bc = BertClient()
vectors=bc.encode(your_list_of_sentences)
```
This would give you a list of vectors, you could write them into a csv and use any clustering algorithm as the sentences are reduced to numbers.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题