问题
I am trying to figure out a way to find all the keywords that come from the same root word (in some sense the opposite action of stemming). Currently, I am using R for coding, but I am open to switching to a different language if it helps.
For instance, I have the root word "rent" and I would like to be able to find "renting", "renter", "rental", "rents" and so on.
回答1:
Try this code in python:
from pattern.en import lexeme
print(lexeme("rent")
the output generated is:
Installation:
pip install pattern
pip install nltk
Now, open a terminal, type python and run the below code.
import nltk
nltk.download(["wordnet","wordnet_ic","sentiwordnet"])
After the installation is done, run the pattern code again.
回答2:
You want to find the opposite of Stemming, but stemming can be your way in.
Look at this example in Python:
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()
words = ["renting", "renter", "rental", "rents", "apple"]
all_rents = {}
for word in words:
stem = stemmer.stem(word)
if stem not in all_rents:
all_rents[stem] = []
all_rents[stem].append(word)
else:
all_rents[stem].append(word)
print(all_rents)
Result:
{'rent': ['renting', 'rents'], 'renter': ['renter'], 'rental': ['rental'], 'appl': ['apple']}
There are several other algorithm to use. However, keep in mind that stemmers are rule-based and are not "smart" to the point where they will select all related words (as seen above). You can even implement your own rules (extend the Stem API from NLTK).
Read more about all available stemmers in NLTK (the module that was used in the above example) here: https://www.nltk.org/api/nltk.stem.html
You can implement your own algorithm as well. For example, you can implement Levenshtein Distance (as proposed in @noski comment) to compute the smaller common prefix. However, you have to do your own research on this one, since it is a complex process.
回答3:
For an R
answer, you can try these functions as a starting point. d.b gives grepl
as an example, here are a few more:
words = c("renting", "renter", "rental", "rents", "apple", "brent")
grepl("rent", words) # TRUE TRUE TRUE TRUE FALSE TRUE
startsWith(words, "rent") # TRUE TRUE TRUE TRUE FALSE FALSE
endsWith(words, "rent") # FALSE FALSE FALSE FALSE FALSE TRUE
来源:https://stackoverflow.com/questions/58066049/how-to-find-all-the-related-keywords-for-a-root-word