I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words.
Now if I have to so
I have just wrote a similar program, with the help of Stack Overflow guys:
from string import punctuation
from operator import itemgetter
N = 100
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = sorted(words.items(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print ("%s %d" % (word, frequency))
Didn't know there was a Counter
object for such a task. Here's how I did it back then, similar to your approach. You can do the sorting on a representation of the same dictionary.
#Takes a list and returns a descending sorted dict of words and their counts
def countWords(a_list):
words = {}
for i in range(len(a_list)):
item = a_list[i]
count = a_list.count(item)
words[item] = count
return sorted(words.items(), key = lambda item: item[1], reverse=True)
An example:
>>>countWords("the quick red fox jumped over the lazy brown dog".split())
[('the', 2), ('brown', 1), ('lazy', 1), ('jumped', 1), ('over', 1), ('fox', 1), ('dog', 1), ('quick', 1), ('red', 1)]
>>> d = {'a': 3, 'b': 1, 'c': 2, 'd': 5, 'e': 0}
>>> l = d.items()
>>> l.sort(key = lambda item: item[1])
>>> l
[('e', 0), ('b', 1), ('c', 2), ('a', 3), ('d', 5)]
You can use the same dictionary:
>>> d = { "foo": 4, "bar": 2, "quux": 3 }
>>> sorted(d.items(), key=lambda item: item[1])
The second line prints:
[('bar', 2), ('quux', 3), ('foo', 4)]
If you only want a sorted word list, do:
>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]
That line prints:
['bar', 'quux', 'foo']