What are some good methods to find the “relatedness” of two bodies of text?

后端 未结 7 832
小鲜肉
小鲜肉 2021-02-02 03:46

Here\'s the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to c

7条回答
  •  囚心锁ツ
    2021-02-02 04:12

    Phonetic algorithms

    The article, Beyond SoundEx - Functions for Fuzzy Searching in MS SQL Server, shows how to install and use the SimMetrics library into SQL Server. This library lets you find relative similarity between strings and includes numerous algorithms.

    I ended up mostly using Jaro Winkler to match on names. Here's more information where I asked about matching names on SO: Matching records based on Person Name

    A few algorithms based on Levenshtein Distance are also available in the SimMetric library and would probably be useful in your application.

提交回复
热议问题