How to find out wether a word exists in english using nltk

前端 未结 3 2015
我寻月下人不归
我寻月下人不归 2021-01-02 13:33

I am looking for a proper solution to this question. This question has been asked many times before and i didnt find a single answer that suited. I need to use a corpus in N

相关标签:
3条回答
  • 2021-01-02 14:08

    NLTK includes some corpora that are nothing more than wordlists. The Words Corpus is the /usr/share/dict/words file from Unix, used by some spell checkers. We can use it to find unusual or mis-spelt words in a text corpus, as shown in :

    def unusual_words(text):
        text_vocab = set(w.lower() for w in text.split() if w.isalpha())
        english_vocab = set(w.lower() for w in nltk.corpus.words.words())
        unusual = text_vocab - english_vocab
        return sorted(unusual)
    

    And in this case you can check the member ship of your word with english_vocab.

    >>> import nltk
    >>> english_vocab = set(w.lower() for w in nltk.corpus.words.words())
    >>> 'a' in english_vocab
    True
    >>> 'this' in english_vocab
    True
    >>> 'nothing' in english_vocab
    True
    >>> 'nothingg' in english_vocab
    False
    >>> 'corpus' in english_vocab
    True
    >>> 'Terminology'.lower() in english_vocab
    True
    >>> 'sorted' in english_vocab
    True
    
    0 讨论(0)
  • 2021-01-02 14:08

    Based on my experience, found two options with NTLK:

    1:

    from nltk.corpus import words
    
    unknown_word = []
    
    if token not in words.words():    
        unknown_word.append(token)
    

    2:

    from nltk.corpus import wordnet
    
    unknown_word = []
    
    if len(wordnet.synsets(token)) == 0:    
        unknown_word.append(token)
    

    Performance of option 2 is better. More relevant word got capture in option 2.

    I will recommended to go for option 2.

    0 讨论(0)
  • 2021-01-02 14:22

    I tried the above approach but for many words which should exist so I tried wordnet. I think this have more comprehensive vacabulary.-

    from nltk.corpus import wordnet if wordnet.synsets(word): #Do something else: #Do some otherthing

    0 讨论(0)
提交回复
热议问题