How would one go about Calculating the Dictionary Size(no.of unique words) of a collection using Zipfs Law?
You will have to tokenize your collection, e.g. by white-space and punctuation. Then you store all the tokens in a hash and count. What you do is then plot the distribution of the counts using a tool like Gnuplot.
Gnuplot