python, convert a dictionary to a sorted list by value instead of key

后端未结

关注

 7  1127

I have a collections.defaultdict(int) that I\'m building to keep count of how many times a key shows up in a set of data. I later want to be able to sort it (obviously by turnin

相关标签:

7条回答

春和景丽

2021-02-04 04:16

from collections import defaultdict
adict = defaultdict(int)

adict['a'] += 1
adict['b'] += 3
adict['c'] += 5
adict['d'] += 2

for key, value in sorted(adict.items(), lambda a, b: cmp(a[1], b[1]), reverse=True):
    print "%r => %r" % (key, value)

>>> 
'c' => 5
'b' => 3
'd' => 2
'a' => 1

0 讨论(0)

暗喜

2021-02-04 04:18

"Invert" a dictionary.

from collections import defaultdict
inv_dict = defaultdict( list )
for key, value in adict:
    inv_dict[value].append( key )
max_value= max( inv_dict.keys() )

The set of keys with the maximum occurrence --

inv_dict[max_value]

The set of keys in descending order by occurrence --

for value, key_list in sorted( inv_dict ):
    print key_list, value

0 讨论(0)

栀梦

2021-02-04 04:20

Note: I'm putting this in as an answer so that it gets seen. I don't want upvotes. If you want to upvote anyone, upvote Nadia.

The currently accepted answer gives timing results which are based on a trivially small dataset (size == 6 - (-5) == 11). The differences in cost of the various methods are masked by the overhead. A use case like what are the most frequent words in a text or most frequent names in a membership list or census involves much larger datasets.

Repeating the experiment with range(-n,n+1) (Windows box, Python 2.6.4, all times in microseconds):

n=5: 11.5, 9.34, 11.3
n=50: 65.5, 46.2, 68.1
n=500: 612, 423, 614

These results are NOT "slightly" different. The itemgetter answer is a clear winner on speed.

There was also mention of "the simplicity of the get idiom". Putting them close together for ease of comparison:

[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)] sorted(adict.iteritems(), key=itemgetter(1), reverse=True)

The get idiom not only looks up the dict twice (as J. F. Sebastian has pointed out), it makes one list (result of sorted()) then iterates over that list to create a result list. I'd call that baroque, not simple. YMMV.

0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2021-02-04 04:22
To get the dictionary sorted:
```
from operator import itemgetter

sorted(adict.iteritems(), key=itemgetter(1), reverse=True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-02-04 04:22
If you're using the newest python 2.7 alpha, then you can use the Counter class in collections module:
```
c = Counter()

c['someval'] += 1
c['anotherval'] += 1
c['someval'] += 1

print c.most_common()
```
prints in the correct order:
```
[('someval', 2), ('anotherval', 1)]
```
The code used on 2.7 is available already and there's a version adapted to 2.5. Perhaps you want to use it to stay forward compatible with the native stdlib version that is about to be released.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-02-04 04:25
A dict's keys, reverse-sorted by the corresponding values, can best be gotten as
```
sorted(adict, key=adict.get, reverse=True)
```
since you want key/value pairs, you could work on the items as all other answers suggest, or (to use the nifty adict.get bound method instead of itemgetters or weird lambdas;-),
```
[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]
```
Edit: in terms of performance, there isn't much into it either way:
```
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' '[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]'
100000 loops, best of 3: 10.8 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6)); from operator import itemgetter' 'sorted(adict.iteritems(), key=itemgetter(1), reverse=True)'
100000 loops, best of 3: 9.66 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' 'sorted(adict.iteritems(), key=lambda (k,v): v, reverse=True)'
100000 loops, best of 3: 11.5 usec per loop
```
So, the .get-based solution is smack midway in performance between the two items-based ones -- slightly slower than the itemgetter, slightly faster than the lambda. In "bottleneck" cases, where those microsecond fractions are crucial to you, by all means do focus on that. In normal cases, where this operation is only one step within some bigger task and a microsecond more or less matters little, focusing on the simplicity of the get idiom is, however, also a reasonable alternative.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页