I\'m familiar with word stemming and completion from the tm package in R.
I\'m trying to come up with a quick and dirty method for finding all variants of a given
This solution requires preprocessing your corpus. But once that is done it is a very quick dictionary lookup.
from collections import defaultdict
from stemming.porter2 import stem
with open('/usr/share/dict/words') as f:
words = f.read().splitlines()
stems = defaultdict(list)
for word in words:
word_stem = stem(word)
stems[word_stem].append(word)
if __name__ == '__main__':
word = 'leukocyte'
word_stem = stem(word)
print(stems[word_stem])
For the /usr/share/dict/words
corpus, this produces the result
['leukocyte', "leukocyte's", 'leukocytes']
It uses the stemming module that can be installed with
pip install stemming