Sorted Word frequency count using python

前端未结

关注

 10  805

I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words.

Now if I have to so

相关标签:

10条回答

小蘑菇

2020-11-28 06:25

I have just wrote a similar program, with the help of Stack Overflow guys:

from string import punctuation
from operator import itemgetter

N = 100
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.items(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print ("%s %d" % (word, frequency))

0 讨论(0)

忘了有多久

2020-11-28 06:30

Didn't know there was a Counter object for such a task. Here's how I did it back then, similar to your approach. You can do the sorting on a representation of the same dictionary.

#Takes a list and returns a descending sorted dict of words and their counts
def countWords(a_list):
    words = {}
    for i in range(len(a_list)):
        item = a_list[i]
        count = a_list.count(item)
        words[item] = count
    return sorted(words.items(), key = lambda item: item[1], reverse=True)

An example:

>>>countWords("the quick red fox jumped over the lazy brown dog".split())
[('the', 2), ('brown', 1), ('lazy', 1), ('jumped', 1), ('over', 1), ('fox', 1), ('dog', 1), ('quick', 1), ('red', 1)]

0 讨论(0)

孤城傲影

2020-11-28 06:33

>>> d = {'a': 3, 'b': 1, 'c': 2, 'd': 5, 'e': 0}
>>> l = d.items()
>>> l.sort(key = lambda item: item[1])
>>> l
[('e', 0), ('b', 1), ('c', 2), ('a', 3), ('d', 5)]

0 讨论(0)

刺人心

2020-11-28 06:39

You can use the same dictionary:

>>> d = { "foo": 4, "bar": 2, "quux": 3 }
>>> sorted(d.items(), key=lambda item: item[1])

The second line prints:

[('bar', 2), ('quux', 3), ('foo', 4)]

If you only want a sorted word list, do:

>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]

That line prints:

['bar', 'quux', 'foo']

0 讨论(0)

上一页 1 2