Sorted Word frequency count using python

前端 未结 10 805
再見小時候
再見小時候 2020-11-28 06:00

I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words.

Now if I have to so

相关标签:
10条回答
  • 2020-11-28 06:25

    I have just wrote a similar program, with the help of Stack Overflow guys:

    from string import punctuation
    from operator import itemgetter
    
    N = 100
    words = {}
    
    words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
                                                 for word in line.split())
    
    for word in words_gen:
        words[word] = words.get(word, 0) + 1
    
    top_words = sorted(words.items(), key=itemgetter(1), reverse=True)[:N]
    
    for word, frequency in top_words:
        print ("%s %d" % (word, frequency))
    
    0 讨论(0)
  • 2020-11-28 06:30

    Didn't know there was a Counter object for such a task. Here's how I did it back then, similar to your approach. You can do the sorting on a representation of the same dictionary.

    #Takes a list and returns a descending sorted dict of words and their counts
    def countWords(a_list):
        words = {}
        for i in range(len(a_list)):
            item = a_list[i]
            count = a_list.count(item)
            words[item] = count
        return sorted(words.items(), key = lambda item: item[1], reverse=True)
    

    An example:

    >>>countWords("the quick red fox jumped over the lazy brown dog".split())
    [('the', 2), ('brown', 1), ('lazy', 1), ('jumped', 1), ('over', 1), ('fox', 1), ('dog', 1), ('quick', 1), ('red', 1)]
    
    0 讨论(0)
  • 2020-11-28 06:33
    >>> d = {'a': 3, 'b': 1, 'c': 2, 'd': 5, 'e': 0}
    >>> l = d.items()
    >>> l.sort(key = lambda item: item[1])
    >>> l
    [('e', 0), ('b', 1), ('c', 2), ('a', 3), ('d', 5)]
    
    0 讨论(0)
  • 2020-11-28 06:39

    You can use the same dictionary:

    >>> d = { "foo": 4, "bar": 2, "quux": 3 }
    >>> sorted(d.items(), key=lambda item: item[1])
    

    The second line prints:

    [('bar', 2), ('quux', 3), ('foo', 4)]
    

    If you only want a sorted word list, do:

    >>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]
    

    That line prints:

    ['bar', 'quux', 'foo']
    
    0 讨论(0)
提交回复
热议问题