问题
I have two lists of ranked items. Each item has an rank and an associated score. The score has decided the rank. The two lists can contains (and usually do) different items, that is their intersection can be empty. I need measures to compare such rankings. Are there well-known algorithms (in literature or real-world systems) to do so ? The measure of distance should take into account the scores as well as the ranks of the items.
回答1:
This question has never been answered before, but I still think it's important to a lot of people out there:
Your two requirements, i.e. non-conjointness of lists and importance of ranks are not met by common correlation tests. In addition to that most of them (Kendall-Tau for example) do not take the order into account:
>>> from scipy.stats import kendalltau
>>> kendalltau([1,2,3,4,5], [2,1,3,4,5])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)
>>> kendalltau([1,2,3,4,5], [1,2,3,5,4])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)
The 1st comparison should yield a significantly smaller value than the 2nd one, because the head of the list is more important than the tail (2nd requirement).
In addition to that one can see that both lists need to be the same size and have the same kind of elements (1st requirement)
Possible solution:
The measure that satisfies all your needs is called Rank Biased Overlap. It's a generalization of the so called average based overlap, which is wonderfully illustrated in this blog. The same guy also put out an implementation of RBO.
Update Jan 2018:
- Another implementation of RBO for python 3.5.2
回答2:
Maybe not solving the issue completely, but definitely worth taking a look at Kendall's weighted tau.
It provides a better way of calculating similarity between ranked lists when order matters as i it allows arbitrary weighting based on rank order.
For example, one may be more interested in upweighting similarity in the top 20 items of the list rather than uniformly.
Also has a nice implementation in scipy.
回答3:
There are many measures to compare top k (ranked) lists. Some very trivial to compute making several simplifying assumptions, others not so trivial but are more rigorous in their evaluation of rank similarity between lists. A recent paper I came across that deals with this problem in a statistically meaningful way, using concepts from information theory and data compression: http://arxiv.org/abs/1310.0110
来源:https://stackoverflow.com/questions/13574406/how-to-compare-ranked-lists