Efficiently count word frequencies in python
I'd like to count frequencies of all words in a text file. >>> countInFile('test.txt') should return {'aaa':1, 'bbb': 2, 'ccc':1} if the target text file is like: # test.txt aaa bbb ccc bbb I've implemented it with pure python following some posts . However, I've found out pure-python ways are insufficient due to huge file size (> 1GB). I think borrowing sklearn's power is a candidate. If you let CountVectorizer count frequencies for each line, I guess you will get word frequencies by summing up each column. But, it sounds a bit indirect way. What is the most efficient and straightforward way