问题
Let's say I have an list:
[4, 5, 2, 1]
I need to rank these and have the output as:
[3, 4, 2, 1]
If two have the same ranking in the case:
[4, 4, 2, 3] then the rankings should be averaged -> [3.5, 3.5, 1, 2]
EDIT
Here rank stands for position of number in a sorted list. If there are multiple numbers with same value, then rank of each such number will be average of their positions.
回答1:
Probably not the most efficient, but this works.
rank
takes a sorted list and an item, and figures out the rank of that item should be by finding where it would be inserted to go before all elements that are equal to it, and after, then averaging the two positions (using array bisection).rank_list
usesrank
to figure out the ranks of all elements. Thepartial
call is just to simplify, and not have to sort the list again for each item lookup.
Like so:
from bisect import bisect_left, bisect_right
from functools import partial
def rank(item, lst):
'''return rank of item in sorted list'''
return (1 + bisect_left(lst, item) + bisect_right(lst, item)) / 2.0
def rank_list(lst):
f = partial(rank, lst=sorted(lst))
return [f(i) for i in lst]
rank_list([4, 4, 2, 1])
## [3.5, 3.5, 2.0, 1.0]
回答2:
I found an answer to this here: Efficient method to calculate the rank vector of a list in Python
def rank_simple(vector):
return sorted(range(len(vector)), key=vector.__getitem__)
def rankdata(a):
n = len(a)
ivec=rank_simple(a)
svec=[a[rank] for rank in ivec]
sumranks = 0
dupcount = 0
newarray = [0]*n
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray
I would like to see if there are any simpler or more efficient ways of doing this.
来源:https://stackoverflow.com/questions/29294877/how-do-i-rank-a-list-in-vanilla-python