Given data that looks like this:
{\'_id\': \'foobar1\',
\'about\': \'similarity in comparison\',
\'categories\': [\'one\', \'two\', \'three\']}
{\'_id\': \'f
If you need to compute text similarity on the about
field, one way to achieve this is to use text index.
For example (in the mongo
shell), if you create a text index on the about
field:
db.collection.createIndex({about: 'text'})
you could execute a query such as (example taken from https://docs.mongodb.com/manual/reference/operator/query/text/#sort-by-text-search-score):
db.collection.find({$text: {$search: 'similarity in comparison'}}, {score: {$meta: 'textScore'}}).sort({score: {$meta: 'textScore'}})
With your example documents, the query should return something like:
{
"_id": "foobar1",
"about": "similarity in comparison",
"score": 1.5
}
{
"_id": "foobar2",
"about": "perfect similarity in comparison",
"score": 1.3333333333333333
}
{
"_id": "foobar3",
"about": "partial similarity",
"score": 0.75
}
which are sorted by decreasing similarity score. Please note that unlike your example result, document foobar4
is not returned because none of the queried words are present in foobar4
.
Text indexes are considered a special type of index in MongoDB, and thus comes with some specific rules on its usage. For more details, please see: