Python script to find word frequencies of a given document

后端 未结 2 1648
青春惊慌失措
青春惊慌失措 2021-01-03 19:16

I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer).

Is there any library or simple script

相关标签:
2条回答
  • 2021-01-03 19:27

    use nltk

    import nltk
    
    YOUR_STRING = "Your words"
    
    words = [w for w in YOUR_STRING.split()]
    freq_dist = nltk.FreqDist(words)
    
    tokens = freq_dist.keys()
    
    #50 most frequent
    most_frequent = tokens[:50]
    
    #50 least frequent
    least_frequent = tokens[-50:]
    
    0 讨论(0)
  • 2021-01-03 19:40

    You should be able to count words. Use a collections.Counter or a dict, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.

    I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt

    0 讨论(0)
提交回复
热议问题