Ngram model and perplexity in NLTK

前端 未结 1 835
耶瑟儿~
耶瑟儿~ 2021-02-02 17:20

To put my question in context, I would like to train and test/compare several (neural) language models. In order to focus on the models rather than data preparation I chose to u

1条回答
  •  暖寄归人
    2021-02-02 17:45

    You are getting a low perplexity because you are using a pentagram model. If you'd use a bigram model your results will be in more regular ranges of about 50-1000 (or about 5 to 10 bits).

    Given your comments, are you using NLTK-3.0alpha? You shouldn't, at least not for language modeling:

    https://github.com/nltk/nltk/issues?labels=model

    As a matter of fact, the whole model module has been dropped from the NLTK-3.0a4 pre-release until the issues are fixed.

    0 讨论(0)
提交回复
热议问题