Effective way to calculate a similarity percentage between data sets

后端 未结 1 1395
南笙
南笙 2021-02-09 22:11

I am currently working with User objects -- each of which have many Goal objects. The Goal objects are not User specific, that is, Users can share the same Goal. I am attempting

相关标签:
1条回答
  • 2021-02-09 22:40

    The standard way to do this is the Jaccard similarity. If A is the set of goals of the first user and B is the set of goals of the second user, the Jaccard similarity is:

    #(A intersect B)/#(A union B)
    

    This is the number of goals they share divided by the total number of votes the two have together (counting goals that they share only once). So if the first user has goals A={1,2,3} and the second user has goals B={2,4} then it is this:

    A intersect B = {2}
    A union B = {1,2,3,4}
    
    #(A intersect B)/#(A union B) = 1/4
    

    The Jaccard similarity is always between 0 (they share no goals) and 1 (they have the same goals), so you can get a percentage by multiplying it by 100.

    http://en.wikipedia.org/wiki/Jaccard_index

    0 讨论(0)
提交回复
热议问题