Following examples on other Stackoverflow posts related to word frequency analysis in Python, my program is returning letter frequency analysis and not actually the word.
<You can use a regex to find all the word (vs character by character that you are getting now):
import re
...
commonWords = Counter(m.group(1) for m in re.finditer(r'\b(\w+)\b', contents))
You can use contents.split()
to split the text on whitespace but that will not separate words from punctuation. You will also have a separate count for word
and word,
and word.
etc which using a regex will fix.
Counter(contents.split())
should use words instead ...
contents
is a string, and strings in Python are iterable (i.e. strings behave like lists of letters in this context) so your Counter is counting letters.
You need to pass the Counter a list of words, not a string of letters.
Joran's answer shows how to do this using split()
.