What are some good methods to find the “relatedness” of two bodies of text?

后端 未结 7 819
小鲜肉
小鲜肉 2021-02-02 03:46

Here\'s the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to c

7条回答
  •  清酒与你
    2021-02-02 04:04

    See Manning and Raghavan course notes about MinHashing and searching for similar items, and a C#(?) version. I believe the techniques come from Ullman and Motwani's research.

提交回复
热议问题