How to find all the related keywords for a root word?

问题

I am trying to figure out a way to find all the keywords that come from the same root word (in some sense the opposite action of stemming). Currently, I am using R for coding, but I am open to switching to a different language if it helps.

For instance, I have the root word "rent" and I would like to be able to find "renting", "renter", "rental", "rents" and so on.

回答1:

Try this code in python:

from pattern.en import lexeme
print(lexeme("rent")

the output generated is:
enter image description here
Installation:
pip install pattern
pip install nltk
Now, open a terminal, type python and run the below code.

import nltk
nltk.download(["wordnet","wordnet_ic","sentiwordnet"])

After the installation is done, run the pattern code again.

回答2:

You want to find the opposite of Stemming, but stemming can be your way in.

Look at this example in Python:

from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
words = ["renting", "renter", "rental", "rents", "apple"]
all_rents = {}
for word in words:
    stem = stemmer.stem(word)
    if stem not in all_rents:
        all_rents[stem] = []
        all_rents[stem].append(word)
    else:
        all_rents[stem].append(word)
print(all_rents)

Result:

{'rent': ['renting', 'rents'], 'renter': ['renter'], 'rental': ['rental'], 'appl': ['apple']}

There are several other algorithm to use. However, keep in mind that stemmers are rule-based and are not "smart" to the point where they will select all related words (as seen above). You can even implement your own rules (extend the Stem API from NLTK).

Read more about all available stemmers in NLTK (the module that was used in the above example) here: https://www.nltk.org/api/nltk.stem.html

You can implement your own algorithm as well. For example, you can implement Levenshtein Distance (as proposed in @noski comment) to compute the smaller common prefix. However, you have to do your own research on this one, since it is a complex process.

回答3:

For an R answer, you can try these functions as a starting point. d.b gives grepl as an example, here are a few more:

words =  c("renting", "renter", "rental", "rents", "apple", "brent")
grepl("rent", words) # TRUE TRUE TRUE TRUE FALSE TRUE
startsWith(words, "rent") # TRUE TRUE TRUE TRUE FALSE FALSE
endsWith(words, "rent") # FALSE FALSE FALSE FALSE FALSE TRUE

来源：https://stackoverflow.com/questions/58066049/how-to-find-all-the-related-keywords-for-a-root-word

标签

nlp

stemming