Find the similarity metric between two strings

前端 未结 11 1811
长情又很酷
长情又很酷 2020-11-22 13:24

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standar

11条回答
  •  盖世英雄少女心
    2020-11-22 14:06

    Solution #1: Python builtin

    use SequenceMatcher from difflib

    pros: native python library, no need extra package.
    cons: too limited, there are so many other good algorithms for string similarity out there.

    example :
    >>> from difflib import SequenceMatcher
    >>> s = SequenceMatcher(None, "abcd", "bcde")
    >>> s.ratio()
    0.75
    

    Solution #2: jellyfish library

    its a very good library with good coverage and few issues. it supports:
    - Levenshtein Distance
    - Damerau-Levenshtein Distance
    - Jaro Distance
    - Jaro-Winkler Distance
    - Match Rating Approach Comparison
    - Hamming Distance

    pros: easy to use, gamut of supported algorithms, tested.
    cons: not native library.

    example:

    >>> import jellyfish
    >>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
    2
    >>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
    0.89629629629629637
    >>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
    1
    

提交回复
热议问题