Scipy: Pearson's correlation always returning 1

问题

I am using Python library scipy to calculate Pearson's correlation for two float arrays. The returned value for coefficient is always 1.0, even if the arrays are different. For example:

[-0.65499887  2.34644428]
[-1.46049758  3.86537321]

I am calling the routine in this way:

r_row, p_value = scipy.stats.pearsonr(array1, array2)

The value of r_row is always 1.0. What am I doing wrong?

回答1:

Pearson's correlation coefficient is a measure of how well your data would be fitted by a linear regression. If you only provide it with two points, then there is a line passing exactly through both points, hence your data perfectly fits a line, hence the correlation coefficient is exactly 1.

回答2:

I think that pearson correlation coefficient always returns 1.0 or -1.0 if each array has just two elements, since you can always draw a perfect straight line through the two points.Try it with arrays of length 3 and it will work:

import scipy
from scipy.stats import pearsonr

x = scipy.array([-0.65499887,  2.34644428, 3.0])
y = scipy.array([-1.46049758,  3.86537321, 21.0])

r_row, p_value = pearsonr(x, y)

Result:

>>> r_row
0.79617014831975552
>>> p_value
0.41371200873701036

来源：https://stackoverflow.com/questions/16063839/scipy-pearsons-correlation-always-returning-1

标签

python

statistics

scipy

correlation

pearson

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!