I know that elasticsearch takes in account the length of a field when computing the score of the documents retrieved by a query. The shorter the field, the higher the weight
I have something that kind of works. With the following, I deduct the length of a field of my interest from the score.
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {...},
"script_score": {
"script": "_score - doc['<field_name>'].value.length()"
}
}
}
}
Hovever, I cannot control the relative weight of this number I am subtracting, compared to the old score. That's why I am not accepting my answer: I'll wait for better ones for a while. Ideally, I'd love to have a way to access the field length norm function within the script_score
, or to get an equivalent result.
It looks like you could achieve that using a field of type token_count together with a field_value_factor function score.
So, something like this in the field mapping:
"name": {
"type": "string",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
This will use the number of tokens in the field. If you want to use the number of characters, you can change the analyzer from standard
to a custom one that tokenizes each character.
Then in the query:
"function_score": {
...,
"field_value_factor": {
"field": "name.length",
"modifier": "reciprocal"
}
}