Here\'s the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to c
See Manning and Raghavan course notes about MinHashing and searching for similar items, and a C#(?) version. I believe the techniques come from Ullman and Motwani's research.