What is wrong with this python function from “Programming Collective Intelligence”?

后端未结

关注

 4  1006

This is the function in question. It calculates the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1.

When I use this

相关标签:

4条回答

灰色年华

2021-01-06 07:36

Well it took me a minute to read over the code but it seems if you change your input data to floats it will work

0 讨论(0)
发布评论:

提交评论
- 加载中...

佛祖请我去吃肉

2021-01-06 07:38

Well, I wasn't exactly able to find what's wrong with the logic in your function, so I just reimplemented it using the definition of Pearson coefficient:

from math import sqrt

def sim_pearson(p1,p2):
    keys = set(p1) | set(p2)
    n = len(keys)

    a1 = sum(p1[it] for it in keys) / n
    a2 = sum(p2[it] for it in keys) / n

#    print(a1, a2)

    sum1Sq = sum((p1[it] - a1) ** 2 for it in keys)
    sum2Sq = sum((p2[it] - a2) ** 2 for it in keys) 

    num = sum((p1[it] - a1) * (p2[it] - a2) for it in keys)
    den = sqrt(sum1Sq * sum2Sq)

#    print(sum1Sq, sum2Sq, num, den)
    return num / den

critics = {
    'user1':{
        'item1': 3,
        'item2': 5,
        'item3': 5,
        },

    'user2':{
        'item1': 4,
        'item2': 5,
        'item3': 5,
        }
}

assert 0.999 < sim_pearson(critics['user1'], critics['user1']) < 1.0001

print('Your example:', sim_pearson(critics['user1'], critics['user2']))
print('Another example:', sim_pearson({1: 1, 2: 2, 3: 3}, {1: 4, 2: 0, 3: 1}))

Note that in your example the Pearson coefficient is just 1.0 since vectors (-4/3, 2/3, 2/3) and (-2/3, 1/3, 1/3) are parallel.

0 讨论(0)

小蘑菇

2021-01-06 07:39
Integer division is confusing it. It works if you make n a float:
```
n=float(len(si))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-01-06 07:59
It looks like you may be unexpectedly using integer division. I made the following change and your function returned 1.0:
```
num=pSum-(1.0*sum1*sum2/n)
den=sqrt((sum1Sq-1.0*pow(sum1,2)/n)*(sum2Sq-1.0*pow(sum2,2)/n))
```
See PEP 238 for more information on the division operator in Python. An alternate way of fixing your above code is:
```
from __future__ import division
```
0 讨论(0)
发布评论:

提交评论
- 加载中...