Diagonal element for covariance matrix not 1 pandas/numpy

依然范特西╮ 提交于 2020-05-26 06:34:27

问题


I have the following dataframe:

   A  B
0  1  5
1  2  6
2  3  7
3  4  8

I wish to calculate the covariance

a = df.iloc[:,0].values

b = df.iloc[:,1].values

Using numpy for cov as :

numpy.cov(a,b)

I get:

array([[ 1.66666667,  1.66666667],
   [ 1.66666667,  1.66666667]])

Shouldn't the diagonal elements be 1? How do I get the diagonal elements to 1?


回答1:


No they shouldn't. I think you might be confusing it with Correlation. Correlation and Covariance are different.

What you see in the diagonals is simply the variance of the variables! Wiki screenshot for the formulas -

Wiki Link




回答2:


Use pd.DataFrame.corr
Also, no need to use Numpy here when the built in Pandas method does the job well for you. Correlations will be one because you've normalized the different series by their respective standard deviations.

df.corr() 

     A    B
A  1.0  1.0
B  1.0  1.0

While pd.DataFrame.cov gets you

df.cov()

          A         B
A  1.666667  1.666667
B  1.666667  1.666667

The other posters are correct. We can see that performing the maths correctly, we get

df.cov().div(df.std()).div(df.std(), 0)

     A    B
A  1.0  1.0
B  1.0  1.0



回答3:


I believe the function that you are looking for should be numpy.corrcoef rather than numpy.cov .

The relationship between correlation matrix and covariance matris is as follows:

R[i,j] = C[i,j]/sqrt(C[i,i]*C[j,k])


来源:https://stackoverflow.com/questions/47427238/diagonal-element-for-covariance-matrix-not-1-pandas-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!