Correlation between columns in DataFrame

后端未结

关注

 1  890

离开以前 2021-01-30 06:56

I\'m pretty new to pandas, so I guess I\'m doing something wrong -

I have a DataFrame:

     a     b
0  0.5  0.75
1  0.5  0.75
2  0.5  0.75
3  0.5  0.75


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   礼貌的吻别
                                             
                
                
                (楼主)
            
              
              
                2021-01-30 07:53
              

            
            
                        
np.correlate calculates the (unnormalized) cross-correlation between two 1-dimensional sequences:

z[k] = sum_n a[n] * conj(v[n+k])


while df.corr (by default) calculates the Pearson correlation coefficient. 

The correlation coefficient (if it exists) is always between -1 and 1 inclusive.
The cross-correlation is not bounded.

The formulas are somewhat related, but notice that in the cross-correlation formula (above) there is no subtraction of the means, and no division by the standard deviations which is part of the formula for Pearson correlation coefficient.

The fact that the standard deviation of df['a'] and df['b'] is zero is what causes df.corr to be NaN everywhere. 



From the comment below, it sounds like you are looking for Beta. It is related to Pearson's correlation coefficient, but instead of dividing by the product of standard deviations:



you divide by a variance:





You can compute Beta using np.cov

cov = np.cov(a, b)
beta = cov[1, 0] / cov[0, 0]




import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)


def geometric_brownian_motion(T=1, N=100, mu=0.1, sigma=0.01, S0=20):
    """
    http://stackoverflow.com/a/13203189/190597 (unutbu)
    """
    dt = float(T) / N
    t = np.linspace(0, T, N)
    W = np.random.standard_normal(size=N)
    W = np.cumsum(W) * np.sqrt(dt)  # standard brownian motion ###
    X = (mu - 0.5 * sigma ** 2) * t + sigma * W
    S = S0 * np.exp(X)  # geometric brownian motion ###
    return S

N = 10 ** 6
a = geometric_brownian_motion(T=1, mu=0.1, sigma=0.01, N=N)
b = geometric_brownian_motion(T=1, mu=0.2, sigma=0.01, N=N)

cov = np.cov(a, b)
print(cov)
# [[ 0.38234755  0.80525967]
#  [ 0.80525967  1.73517501]]
beta = cov[1, 0] / cov[0, 0]
print(beta)
# 2.10609347015

plt.plot(a)
plt.plot(b)
plt.show()




The ratio of mus is 2, and beta is ~2.1.



And you could also compute it with df.corr, though this is a much more round-about way of doing it (but it is nice to see there is consistency):

import pandas as pd
df = pd.DataFrame({'a': a, 'b': b})
beta2 = (df.corr() * df['b'].std() * df['a'].std() / df['a'].var()).ix[0, 1]
print(beta2)
# 2.10609347015
assert np.allclose(beta, beta2)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复