I have a corpus of almost 2m documents. I want to calculate the term frequencies of the terms in the whole corpus, regardless of document boundaries.
A naive approach