问题
I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer).
Is there any library or simple script that does this process?
回答1:
use nltk
import nltk
YOUR_STRING = "Your words"
words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)
tokens = freq_dist.keys()
#50 most frequent
most_frequent = tokens[:50]
#50 least frequent
least_frequent = tokens[-50:]
回答2:
You should be able to count words. Use a collections.Counter
or a dict
, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.
I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt
来源:https://stackoverflow.com/questions/7480000/python-script-to-find-word-frequencies-of-a-given-document