Python: sort an array of dictionaries with custom comparator?

后端 未结 8 1225

I have the following Python array of dictionaries:

myarr = [ { \'name\': \'Richard\', \'rank\': 1 },
{ \'name\': \'Reuben\', \'rank\': 4 },
{ \'name\': \'Reece\'         


        
8条回答
  •  南方客
    南方客 (楼主)
    2021-02-08 10:56

    Option 1:

    key=lambda d:(d['rank']==0, d['rank'])
    

    Option 2:

    key=lambda d:d['rank'] if d['rank']!=0 else float('inf')
    

    Demo:

    "I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0." --original poster

    >>> sorted([0,0,0,1,2,3,4], key=lambda x:(x==0, x))
    [1, 2, 3, 4, 0, 0]
    
    >>> sorted([0,0,0,1,2,3,4], key=lambda x:x if x!=0 else float('inf'))
    [1, 2, 3, 4, 0, 0]
    

     

    Additional comments:

    "Please could you explain to me (a Python novice) what it's doing? I can see that it's a lambda, which I know is an anonymous function: what's the bit in brackets?" – OP comment

    Indexing/slice notation:

    itemgetter('rank') is the same thing as lambda x: x['rank'] is the same thing as the function:

    def getRank(myDict):
        return myDict['rank']
    

    The [...] is called the indexing/slice notation, see Explain Python's slice notation - Also note that someArray[n] is common notation in many programming languages for indexing, but may not support slices of the form [start:end] or [start:end:step].

    key= vs cmp= vs rich comparison:

    As for what is going on, there are two common ways to specify how a sorting algorithm works: one is with a key function, and the other is with a cmp function (now deprecated in python, but a lot more versatile). While a cmp function allows you to arbitrarily specify how two elements should compare (input: a,b; output: a or a>b or a==b). Though legitimate, it gives us no major benefit (we'd have to duplicate code in an awkward manner), and a key function is more natural for your case. (See "object rich comparison" for how to implicitly define cmp= in an elegant but possibly-excessive way.)

    Implementing your key function:

    Unfortunately 0 is an element of the integers and thus has a natural ordering: 0 is normally < 1,2,3... Thus if we want to impose an extra rule, we need to sort the list at a "higher level". We do this by making the key a tuple: tuples are sorted first by their 1st element, then by their 2nd element. True will always be ordered after False, so all the Trues will be ordered after the Falses; they will then sort as normal: (True,1)<(True,2)<(True,3)<..., (False,1)<(False,2)<..., (False,*)<(True,*). The alternative (option 2), merely assigns rank-0 dictionaries a value of infinity, since that is guaranteed to be above any possible rank.

    More general alternative - object rich comparison:

    The even more general solution would be to create a class representing records, then implement __lt__, __gt__, __eq__, __ne__, __gt__, __ge__, and all the other rich comparison operators, or alternatively just implement one of those and __eq__ and use the @functools.total_ordering decorator. This will cause objects of that class to use the custom logic whenever you use comparison operators (e.g. x=Record(name='Joe', rank=12) y=Record(...) x); since the sorted(...) function uses < and other comparison operators by default in a comparison sort, this will make the behavior automatic when sorting, and in other instances where you use < and other comparison operators. This may or may not be excessive depending on your use case.

    Cleaner alternative - don't overload 0 with semantics:

    I should however point out that it's a bit artificial to put 0s behind 1,2,3,4,etc. Whether this is justified depends on whether rank=0 really means rank=0; if rank=0 are really "lower" than rank=1 (which in turn are really "lower" than rank=2...). If this is truly the case, then your method is perfectly fine. If this is not the case, then you might consider omitting the 'rank':... entry as opposed to setting 'rank':0. Then you could sort by Lev Levitsky's answer using 'rank' in d, or by:

    Option 1 with different scheme:

    key=lambda d: (not 'rank' in d, d['rank'])
    

    Option 2 with different scheme:

    key=lambda d: d.get('rank', float('inf'))
    

    sidenote: Relying on the existence of infinity in python is almost borderline a hack, making any of the mentioned solutions (tuples, object comparison), Lev's filter-then-concatenate solution, and even maybe the slightly-more-complicated cmp solution (typed up by wilson), more generalizable to other languages.

提交回复
热议问题