I have around a quarter of a million blocks of text, akin to forum posts in simple text form. Given that there is some repetition in these posts I\'d like to discover to what th