Word frequency analysis in Python returning letter frequency

后端未结

关注

 3  640

Following examples on other Stackoverflow posts related to word frequency analysis in Python, my program is returning letter frequency analysis and not actually the word.

相关标签:

3条回答

忘掉有多难

2021-01-27 01:38
You can use a regex to find all the word (vs character by character that you are getting now):
```
import re

...

commonWords = Counter(m.group(1) for m in re.finditer(r'\b(\w+)\b', contents))
```
You can use contents.split() to split the text on whitespace but that will not separate words from punctuation. You will also have a separate count for word and word, and word. etc which using a regex will fix.
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-01-27 01:40
```
Counter(contents.split())
```
should use words instead ...
0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2021-01-27 01:49

contents is a string, and strings in Python are iterable (i.e. strings behave like lists of letters in this context) so your Counter is counting letters.

You need to pass the Counter a list of words, not a string of letters.

Joran's answer shows how to do this using split().

0 讨论(0)
发布评论:

提交评论
- 加载中...