Python script to find word frequencies of a given document

谁都会走 提交于 2019-12-30 05:31:05

问题


I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer).

Is there any library or simple script that does this process?


回答1:


use nltk

import nltk

YOUR_STRING = "Your words"

words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)

tokens = freq_dist.keys()

#50 most frequent
most_frequent = tokens[:50]

#50 least frequent
least_frequent = tokens[-50:]



回答2:


You should be able to count words. Use a collections.Counter or a dict, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.

I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt



来源:https://stackoverflow.com/questions/7480000/python-script-to-find-word-frequencies-of-a-given-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!