问题
I would like to get lemma "dive" from all possible forms of the word using textstem package in R.
But when I used textstem package in r, the basic form becomes a very strange result.
library(textstem)
words<-c("dived", "diving", "dive")
lemmatize_strings(words, dictionary = lexicon::hash_lemmas)
[1] "dive" "dive" "diva"
Here, I do not want "dive" as a result from a word "dive", instead I need to lemmatize the word "dive" into "dive", so it can be counted as the same word with other forms "dived", "diving". So it should be like this, below.
[1] "dive" "dive" "dive"
I found this link (stemDocment in tm package not working on past tense word), but it might not be useful in my case since I would have to process more than 80,000 reviews and I am highly likely to come across the same problem with different words.
I use lemmatize_strings
for the dataset I have but it gives exactly the same result (though it's bit obvious). Can anyone please help me?
Thank you very much in advance!
来源:https://stackoverflow.com/questions/50401056/strange-lemmatization-result-in-r-textstem-package