Ngram model and perplexity in NLTK

前端未结

关注

 1  834

To put my question in context, I would like to train and test/compare several (neural) language models. In order to focus on the models rather than data preparation I chose to u

相关标签:

1条回答

暖寄归人

2021-02-02 17:45

You are getting a low perplexity because you are using a pentagram model. If you'd use a bigram model your results will be in more regular ranges of about 50-1000 (or about 5 to 10 bits).

Given your comments, are you using NLTK-3.0alpha? You shouldn't, at least not for language modeling:

https://github.com/nltk/nltk/issues?labels=model

As a matter of fact, the whole model module has been dropped from the NLTK-3.0a4 pre-release until the issues are fixed.

0 讨论(0)
发布评论:

提交评论
- 加载中...