问题
How would one go about Calculating the Dictionary Size(no.of unique words) of a collection using Zipfs Law?
回答1:
You will have to tokenize your collection, e.g. by white-space and punctuation. Then you store all the tokens in a hash and count. What you do is then plot the distribution of the counts using a tool like Gnuplot
.
来源:https://stackoverflow.com/questions/47543798/estimate-dictionary-size-using-zipf-s-law