Strange lemmatization result in r, textstem package

北城以北 提交于 2021-01-29 09:30:29

问题


I would like to get lemma "dive" from all possible forms of the word using textstem package in R.

But when I used textstem package in r, the basic form becomes a very strange result.

library(textstem)
words<-c("dived", "diving", "dive")

lemmatize_strings(words, dictionary = lexicon::hash_lemmas)

[1] "dive" "dive" "diva"

Here, I do not want "dive" as a result from a word "dive", instead I need to lemmatize the word "dive" into "dive", so it can be counted as the same word with other forms "dived", "diving". So it should be like this, below.

[1] "dive" "dive" "dive"

I found this link (stemDocment in tm package not working on past tense word), but it might not be useful in my case since I would have to process more than 80,000 reviews and I am highly likely to come across the same problem with different words.

I use lemmatize_stringsfor the dataset I have but it gives exactly the same result (though it's bit obvious). Can anyone please help me?

Thank you very much in advance!

来源:https://stackoverflow.com/questions/50401056/strange-lemmatization-result-in-r-textstem-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!