问题
Generally, in natural language processing, we want to get the lemma of a token.
For example, we can map 'eaten' to 'eat' using wordnet lemmatization.
Is there any tools in python that can inverse lemma to a certain form?
For example, we map 'go' to 'gone' given target form 'eaten'.
PS: Someone mentions we have to store such mappings. How to un-stem a word in Python?
回答1:
Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:
NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."
Libraries for this are not as frequently used as lemmatizers, which generally means you have fewer options and are less likely to find a well developed library. The Wikipedia example is in Java because the most popular library supporting this is SimpleNLG.
A quick search found pynlg, though it doesn't seem actively maintained. Alternately you can use SimpleNLG via an HTTP JSON interface provided by the Python library nlgserv.
来源:https://stackoverflow.com/questions/45590278/how-to-inverse-lemmatization-process-given-a-lemma-and-a-token