how to return the top n most frequently occurring chars and their respective counts # e.g \'aaaaaabbbbcccc\'
, 2
should return [(\'a\', 6), (\'b\'
Use collections.Counter()
; it has a most_common()
method that does just that:
>>> from collections import Counter
>>> counts = Counter('aaaaaabbbbcccc')
>>> counts.most_common(2)
[('a', 6), ('c', 4)]
Note that for both the above input and in aabc
both b
and c
have the same count, and both can be valid top contenders. Because both you and Counter
sort by count then key in reverse, c
is sorted before b
.
If instead of sorting in reverse, you used the negative count as the sort key, you'd sort b
before c
again:
list4.sort(key=lambda v: (-v[1], v[0))
Not that Counter.most_common()
actually uses sorting when your are asking for fewer items than there are keys in the counter; it uses a heapq-based algorithm instead to only get the top N items.
A little harder, but also works:
text = "abbbaaaa"
dict = {}
for lines in text:
for char in lines:
dict[char] = dict.get(char, 0) + 1
print dict