Estimate Dictionary size using Zipf’s Law

前端 未结 1 1650
挽巷
挽巷 2021-01-29 06:43

How would one go about Calculating the Dictionary Size(no.of unique words) of a collection using Zipfs Law?

相关标签:
1条回答
  • 2021-01-29 07:11

    You will have to tokenize your collection, e.g. by white-space and punctuation. Then you store all the tokens in a hash and count. What you do is then plot the distribution of the counts using a tool like Gnuplot.

    0 讨论(0)
提交回复
热议问题