Weighted random sampling in Elasticsearch

孤人 提交于 2019-11-28 23:49:07

In case it helps anyone, here is how I recently implemented a weighted shuffling.

On this example, we shuffle companies. Each company has a "company_score" between 0 and 100. With this simple weighted shuffling, a company with score 100 is 5 times more likely to appear in first page than a company with score 20.

json_body = {
    "sort": ["_score"],
    "query": {
        "function_score": {
            "query": main_query,  # put your main query here
            "functions": [
                {
                    "random_score": {},
                },
                {
                    "field_value_factor": {
                        "field": "company_score",
                        "modifier": "none",
                        "missing": 0,
                    }
                }
            ],
            # How to combine the result of the two functions 'random_score' and 'field_value_factor'.
            # This way, on average the combined _score of a company having score 100 will be 5 times as much
            # as the combined _score of a company having score 20, and thus will be 5 times more likely
            # to appear on first page.
            "score_mode": "multiply",
            # How to combine the result of function_score with the original _score from the query.
            # We overwrite it as our combined _score (random x company_score) is all we need.
            "boost_mode": "replace",
        }
    }
}

I know this question is old, but answering for any future searchers.

The comment before yours in the GitHub thread seems to have the answer. If each of your documents has a relative weight, then you can pick a random score for each document and multiply it by the weight to create your new weighted random score. This has the added bonus of not needing the sum of weights.

e.g. if two documents have weights 1 and 2, then you'd expect the second to have twice the likelihood of selection as the first. Give each document a random score between 0 and 1 (which you're already doing with "random_score"). Multiply the random score by the weight and you'll have the first document with a score between 0 and 1 and the second with a score between 0 and 2, so twice as likely to be selected!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!