I am looking for a proper solution to this question. This question has been asked many times before and i didnt find a single answer that suited. I need to use a corpus in N
Based on my experience, found two options with NTLK:
1:
from nltk.corpus import words
unknown_word = []
if token not in words.words():
unknown_word.append(token)
2:
from nltk.corpus import wordnet
unknown_word = []
if len(wordnet.synsets(token)) == 0:
unknown_word.append(token)
Performance of option 2 is better. More relevant word got capture in option 2.
I will recommended to go for option 2.