问题
Why does the porter stemming algorithm online at
http://text-processing.com/demo/stem/
stem fried
to fri
and not fry
?
I can't recall any words ending with ied
past tense in English that have a nominative form ending with i
.
Is this a bug?
回答1:
A stem as returned by Porter Stemmer is not necessarily the base form of a verb, or a valid word at all. If you're looking for that, you need to look for a lemmatizer instead.
回答2:
Firstly, a stemmer is not a lemmatizer, see also Stemmers vs Lemmatizers:
>>> from nltk.stem import PorterStemmer, WordNetLemmatizer
>>> porter = PorterStemmer()
>>> wnl = WordNetLemmatizer()
>>> fried = 'fried'
>>> porter.stem(fried)
u'fri'
>>> wnl.lemmatize(fried)
'fried'
Next, a lemmatizer is Part-Of-Speech (POS) sensitive:
>>> wnl.lemmatize(fried, pos='v')
u'fry'
来源:https://stackoverflow.com/questions/27659179/porter-stemming-of-fried