Correlation between columns in DataFrame

后端 未结 1 888
离开以前
离开以前 2021-01-30 06:56

I\'m pretty new to pandas, so I guess I\'m doing something wrong -

I have a DataFrame:

     a     b
0  0.5  0.75
1  0.5  0.75
2  0.5  0.75
3  0.5  0.75
         


        
1条回答
  •  礼貌的吻别
    2021-01-30 07:53

    np.correlate calculates the (unnormalized) cross-correlation between two 1-dimensional sequences:

    z[k] = sum_n a[n] * conj(v[n+k])
    

    while df.corr (by default) calculates the Pearson correlation coefficient.

    The correlation coefficient (if it exists) is always between -1 and 1 inclusive. The cross-correlation is not bounded.

    The formulas are somewhat related, but notice that in the cross-correlation formula (above) there is no subtraction of the means, and no division by the standard deviations which is part of the formula for Pearson correlation coefficient.

    The fact that the standard deviation of df['a'] and df['b'] is zero is what causes df.corr to be NaN everywhere.


    From the comment below, it sounds like you are looking for Beta. It is related to Pearson's correlation coefficient, but instead of dividing by the product of standard deviations:

    enter image description here

    you divide by a variance:

    enter image description here


    You can compute Beta using np.cov

    cov = np.cov(a, b)
    beta = cov[1, 0] / cov[0, 0]
    

    import numpy as np
    import matplotlib.pyplot as plt
    np.random.seed(100)
    
    
    def geometric_brownian_motion(T=1, N=100, mu=0.1, sigma=0.01, S0=20):
        """
        http://stackoverflow.com/a/13203189/190597 (unutbu)
        """
        dt = float(T) / N
        t = np.linspace(0, T, N)
        W = np.random.standard_normal(size=N)
        W = np.cumsum(W) * np.sqrt(dt)  # standard brownian motion ###
        X = (mu - 0.5 * sigma ** 2) * t + sigma * W
        S = S0 * np.exp(X)  # geometric brownian motion ###
        return S
    
    N = 10 ** 6
    a = geometric_brownian_motion(T=1, mu=0.1, sigma=0.01, N=N)
    b = geometric_brownian_motion(T=1, mu=0.2, sigma=0.01, N=N)
    
    cov = np.cov(a, b)
    print(cov)
    # [[ 0.38234755  0.80525967]
    #  [ 0.80525967  1.73517501]]
    beta = cov[1, 0] / cov[0, 0]
    print(beta)
    # 2.10609347015
    
    plt.plot(a)
    plt.plot(b)
    plt.show()
    

    enter image description here

    The ratio of mus is 2, and beta is ~2.1.


    And you could also compute it with df.corr, though this is a much more round-about way of doing it (but it is nice to see there is consistency):

    import pandas as pd
    df = pd.DataFrame({'a': a, 'b': b})
    beta2 = (df.corr() * df['b'].std() * df['a'].std() / df['a'].var()).ix[0, 1]
    print(beta2)
    # 2.10609347015
    assert np.allclose(beta, beta2)
    

    0 讨论(0)
提交回复
热议问题