How do I convert between a measure of similarity and a measure of difference (distance)?

前端 未结 8 1729
悲&欢浪女
悲&欢浪女 2021-02-02 01:57

Is there a general way to convert between a measure of similarity and a measure of distance?

Consider a similarity measure like the number of 2-grams that two strings ha

相关标签:
8条回答
  • 2021-02-02 02:18
    similarity = 1/difference
    

    and watch out for difference = 0

    0 讨论(0)
  • 2021-02-02 02:23

    Yes, there is a most general way to change between similarity and distance: a strictly monotone decreasing function f(x).

    That is, with f(x) you can make similarity = f(distance) or distance = f(similarity). It works in both directions. Such function works, because the relation between similarity and distance is that one decreases when the other increases.

    Examples:

    These are some well-known strictly monotone decreasing candidates that work for non-negative similarities or distances:

    • f(x) = 1 / (a + x)
    • f(x) = exp(- x^a)
    • f(x) = arccot(ax)

    You can choose parameter a>0 (e.g., a=1)

    0 讨论(0)
  • 2021-02-02 02:25

    In one of my projects (based on Collaborative Filtering) I had to convert between correlation (cosine between vectors) which was from -1 to 1 (closer 1 is more similar, closer to -1 is more diverse) to normalized distance (close to 0 the distance is smaller and if it's close to 1 the distance is bigger)

    In this case: distance ~ diversity

    My formula was: dist = 1 - (cor + 1)/2

    If you have similarity to diversity and the domain is [0,1] in both cases the simlest way is:

    dist = 1 - sim

    sim = 1 - dist

    0 讨论(0)
  • 2021-02-02 02:26

    Doing 1/similarity is not going to keep the properties of the distribution.

    the best way is distance (a->b) = highest similarity - similarity (a->b). with highest similarity being the similarity with the biggest value. You hence flip your distribution. the highest similarity becomes 0 etc

    0 讨论(0)
  • 2021-02-02 02:30

    Let d denotes distance, s denotes similarity. To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by:

    s = 1 - d_norm.

    where s is in the range [0 1], with 1 denotes highest similarity (the items in comparison are identical), and 0 denotes lowest similarity (largest distance).

    0 讨论(0)
  • 2021-02-02 02:31

    If your similarity measure (s) is between 0 and 1, you can use one of these:

    1-s
    sqrt(1-s)
    -log(s)
    (1/s)-1
    
    0 讨论(0)
提交回复
热议问题