Is there any way to get base word instead of root word in stemming using NLP in R?
Code:
> #Loading libraries
> library(tm)
> library(slam)
>
Without a good knowledge of English morphology, you would have to use an existing library rather than create your own stemmer.
English is full of unexpected morphological surprises that would affect both probabilistic and rule-based models. Some examples are:
English also has an issue with I-umlaut, where words like men, geese, feet, best, and a host of other words (all with an 'e'-like sound) cannot be easily stemmed. Stemming foreign, borrowed words, like automaton, may also be an issue.
Stemming the superlative form is a good example of exceptions:
best -> good
eldest -> old
A lemmatizer would account for such exceptions, but would be slower. You can look at the Porter stemmer rules to get an idea of what you need, or you can just use its SnowballC R package.