For ElMo, FastText and Word2Vec, I\'m averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.
A good example of t
You will need to generate bert embeddidngs for the sentences first. bert-as-service provides a very easy way to generate embeddings for sentences.
This is how you can geberate bert vectors for a list of sentences you need to cluster. It is explained very well in the bert-as-service repository: https://github.com/hanxiao/bert-as-service
Installations:
pip install bert-serving-server # server
pip install bert-serving-client # client, independent of `bert-serving-server`
Download one of the pre-trained models available at https://github.com/google-research/bert
Start the service:
bert-serving-start -model_dir /your_model_directory/ -num_worker=4
Generate the vectors for the list of sentences:
from bert_serving.client import BertClient
bc = BertClient()
vectors=bc.encode(your_list_of_sentences)
This would give you a list of vectors, you could write them into a csv and use any clustering algorithm as the sentences are reduced to numbers.