ALGORITHM - String similarity score/hash

后端 未结 8 1279
遇见更好的自我
遇见更好的自我 2021-02-01 10:16

Is there a method to calculate something like general \"similarity score\" of a string? In a way that I am not comparing two strings together but rather I get some number/scores

相关标签:
8条回答
  • 2021-02-01 10:38

    You can always use Levenshtein distance, also, there is a written implementation for that: http://code.google.com/p/pylevenshtein/

    But, for simplicity, you can use builtin difflib module:

    >>> import difflib
    >>> l
    {'Hello Earth', 'Hello World!', 'Foo Bar!', 'Foo world!', 'Foo bar', 'Hello World', 'FooBarbar'}
    >>> difflib.get_close_matches("Foo World", l)
    ['Foo world!', 'Hello World', 'Hello World!']
    

    http://docs.python.org/library/difflib.html#difflib.get_close_matches

    0 讨论(0)
  • 2021-02-01 10:51

    Have a look at locality-sensitive hashing.

    The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

    There's a very good explanation available here together with some sample code.

    0 讨论(0)
提交回复
热议问题