Best way to turn word list into frequency dict

后端未结

关注

 8  497

What\'s the best way to convert a list/tuple into a dict where the keys are the distinct values of the list and the values are the the frequencies of those distinct values?<

相关标签:

8条回答

Happy的楠姐

2020-12-03 17:30

Just a note that, starting with Python 2.7/3.1, this functionality will be built in to the collections module, see this bug for more information. Here's the example from the release notes:

>>> from collections import Counter
>>> c=Counter()
>>> for letter in 'here is a sample of english text':
...   c[letter] += 1
...
>>> c
Counter({' ': 6, 'e': 5, 's': 3, 'a': 2, 'i': 2, 'h': 2,
'l': 2, 't': 2, 'g': 1, 'f': 1, 'm': 1, 'o': 1, 'n': 1,
'p': 1, 'r': 1, 'x': 1})
>>> c['e']
5
>>> c['z']
0

0 讨论(0)

误落风尘

2020-12-03 17:31
I find that the easiest to understand (while might not be the most efficient) way is to do:
```
{i:words.count(i) for i in set(words)}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-03 17:35
Kind of
```
from collections import defaultdict
fq= defaultdict( int )
for w in words:
    fq[w] += 1
```
That usually works nicely.
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-03 17:35
This is an abomination, but:
```
from itertools import groupby
dict((k, len(list(xs))) for k, xs in groupby(sorted(items)))
```
I can't think of a reason one would choose this method over S.Lott's, but if someone's going to point it out, it might as well be me. :)
0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2020-12-03 17:35

I have to share an interesting but kind of ridiculous way of doing it that I just came up with:

>>> class myfreq(dict):
...     def __init__(self, arr):
...         for k in arr:
...             self[k] = 1
...     def __setitem__(self, k, v):
...         dict.__setitem__(self, k, self.get(k, 0) + v)
... 
>>> myfreq(['a', 'b', 'b', 'a', 'b', 'c'])
{'a': 2, 'c': 1, 'b': 3}

0 讨论(0)

隐瞒了意图╮

2020-12-03 17:45

I decided to go ahead and test the versions suggested, I found the collections.Counter as suggested by Jacob Gabrielson to be the fastest, followed by the defaultdict version by SLott.

Here are my codes :

from collections import defaultdict
from collections import Counter

import random

# using default dict
def counter_default_dict(list):
    count=defaultdict(int)
    for i in list:
        count[i]+=1
    return count

# using normal dict
def counter_dict(list):
    count={}
    for i in list:
        count.update({i:count.get(i,0)+1})
    return count

# using count and dict
def counter_count(list):
    count={i:list.count(i) for i in set(list)}
    return count

# using count and dict
def counter_counter(list):
    count = Counter(list)
    return count

list=sorted([random.randint(0,250) for i in range(300)])


if __name__=='__main__':
    from timeit import timeit
    print("collections.Defaultdict ",timeit("counter_default_dict(list)", setup="from __main__ import counter_default_dict,list", number=1000))
    print("Dict",timeit("counter_dict(list)",setup="from __main__ import counter_dict,list",number=1000))
    print("list.count ",timeit("counter_count(list)", setup="from __main__ import counter_count,list", number=1000))
    print("collections.Counter.count ",timeit("counter_counter(list)", setup="from __main__ import counter_counter,list", number=1000))

And my results:

collections.Defaultdict 
0.06787874956330614
Dict
 0.15979115872995675
list.count 
 1.199258431219126
collections.Counter.count
 0.025896202538920665

Do let me know how I can improve the analysis.

0 讨论(0)

1 2 下一页