python, convert a dictionary to a sorted list by value instead of key

后端 未结 7 1112
别跟我提以往
别跟我提以往 2021-02-04 03:45

I have a collections.defaultdict(int) that I\'m building to keep count of how many times a key shows up in a set of data. I later want to be able to sort it (obviously by turnin

相关标签:
7条回答
  • 2021-02-04 04:16
    from collections import defaultdict
    adict = defaultdict(int)
    
    adict['a'] += 1
    adict['b'] += 3
    adict['c'] += 5
    adict['d'] += 2
    
    for key, value in sorted(adict.items(), lambda a, b: cmp(a[1], b[1]), reverse=True):
        print "%r => %r" % (key, value)
    
    >>> 
    'c' => 5
    'b' => 3
    'd' => 2
    'a' => 1
    

     

    0 讨论(0)
  • 2021-02-04 04:18

    "Invert" a dictionary.

    from collections import defaultdict
    inv_dict = defaultdict( list )
    for key, value in adict:
        inv_dict[value].append( key )
    max_value= max( inv_dict.keys() )
    

    The set of keys with the maximum occurrence --

    inv_dict[max_value] 
    

    The set of keys in descending order by occurrence --

    for value, key_list in sorted( inv_dict ):
        print key_list, value
    
    0 讨论(0)
  • 2021-02-04 04:20

    Note: I'm putting this in as an answer so that it gets seen. I don't want upvotes. If you want to upvote anyone, upvote Nadia.

    The currently accepted answer gives timing results which are based on a trivially small dataset (size == 6 - (-5) == 11). The differences in cost of the various methods are masked by the overhead. A use case like what are the most frequent words in a text or most frequent names in a membership list or census involves much larger datasets.

    Repeating the experiment with range(-n,n+1) (Windows box, Python 2.6.4, all times in microseconds):

    n=5: 11.5, 9.34, 11.3
    n=50: 65.5, 46.2, 68.1
    n=500: 612, 423, 614

    These results are NOT "slightly" different. The itemgetter answer is a clear winner on speed.

    There was also mention of "the simplicity of the get idiom". Putting them close together for ease of comparison:

    [(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)] sorted(adict.iteritems(), key=itemgetter(1), reverse=True)

    The get idiom not only looks up the dict twice (as J. F. Sebastian has pointed out), it makes one list (result of sorted()) then iterates over that list to create a result list. I'd call that baroque, not simple. YMMV.

    0 讨论(0)
  • 2021-02-04 04:22

    To get the dictionary sorted:

    from operator import itemgetter
    
    sorted(adict.iteritems(), key=itemgetter(1), reverse=True)
    
    0 讨论(0)
  • 2021-02-04 04:22

    If you're using the newest python 2.7 alpha, then you can use the Counter class in collections module:

    c = Counter()
    
    c['someval'] += 1
    c['anotherval'] += 1
    c['someval'] += 1
    
    print c.most_common()
    

    prints in the correct order:

    [('someval', 2), ('anotherval', 1)]
    

    The code used on 2.7 is available already and there's a version adapted to 2.5. Perhaps you want to use it to stay forward compatible with the native stdlib version that is about to be released.

    0 讨论(0)
  • 2021-02-04 04:25

    A dict's keys, reverse-sorted by the corresponding values, can best be gotten as

    sorted(adict, key=adict.get, reverse=True)
    

    since you want key/value pairs, you could work on the items as all other answers suggest, or (to use the nifty adict.get bound method instead of itemgetters or weird lambdas;-),

    [(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]
    

    Edit: in terms of performance, there isn't much into it either way:

    $ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' '[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]'
    100000 loops, best of 3: 10.8 usec per loop
    $ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6)); from operator import itemgetter' 'sorted(adict.iteritems(), key=itemgetter(1), reverse=True)'
    100000 loops, best of 3: 9.66 usec per loop
    $ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' 'sorted(adict.iteritems(), key=lambda (k,v): v, reverse=True)'
    100000 loops, best of 3: 11.5 usec per loop
    

    So, the .get-based solution is smack midway in performance between the two items-based ones -- slightly slower than the itemgetter, slightly faster than the lambda. In "bottleneck" cases, where those microsecond fractions are crucial to you, by all means do focus on that. In normal cases, where this operation is only one step within some bigger task and a microsecond more or less matters little, focusing on the simplicity of the get idiom is, however, also a reasonable alternative.

    0 讨论(0)
提交回复
热议问题