Word counter using python

后端 未结 1 1485
失恋的感觉
失恋的感觉 2021-01-28 07:17

I wrote a code for word count in python.

I wanted to get text and frequency of each words from the following page: http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=

相关标签:
1条回答
  • 2021-01-28 07:48

    You are building a fresh word_count dictionary for every verse and then you printing out the word_count for only this verse. Instead you need to have only one instance of word_count.

    Update: There were other problems with the code, plus you should use regular expressions to remove all non-alphanumeric characters, plus you should use collections.Counter, as it makes your code a lot shorter, and, as a nice side effect, let's you retrieve the most common words:

    import requests
    import re
    from bs4 import BeautifulSoup
    from collections import Counter
    
    
    def parse(url):
        html = requests.get(url).text
        soup = BeautifulSoup(html, "html.parser")
        count = Counter()
        for bible_text in soup.findAll('font', {'class': 'tk4l'}):
            text = re.sub("[^\w0-9 ]", "", bible_text.get_text().lower())
            count.update(text.split(" "))
        return count
    
    word_count = parse('http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL=1&CN=1&CV=99')
    print(word_count.most_common(10))
    

    Output:

    [('the', 83), ('and', 71), ('god', 30), ('was', 29), ('to', 22), ('it', 17), ('of', 16), ('there', 16), ('that', 15), ('in', 15)]
    
    0 讨论(0)
提交回复
热议问题