Word frequency analysis in Python returning letter frequency

后端 未结 3 633
野的像风
野的像风 2021-01-27 01:25

Following examples on other Stackoverflow posts related to word frequency analysis in Python, my program is returning letter frequency analysis and not actually the word.

<
相关标签:
3条回答
  • 2021-01-27 01:38

    You can use a regex to find all the word (vs character by character that you are getting now):

    import re
    
    ...
    
    commonWords = Counter(m.group(1) for m in re.finditer(r'\b(\w+)\b', contents))
    

    You can use contents.split() to split the text on whitespace but that will not separate words from punctuation. You will also have a separate count for word and word, and word. etc which using a regex will fix.

    0 讨论(0)
  • 2021-01-27 01:40
    Counter(contents.split())
    

    should use words instead ...

    0 讨论(0)
  • 2021-01-27 01:49

    contents is a string, and strings in Python are iterable (i.e. strings behave like lists of letters in this context) so your Counter is counting letters.

    You need to pass the Counter a list of words, not a string of letters.

    Joran's answer shows how to do this using split().

    0 讨论(0)
提交回复
热议问题