I\'m pretty new to pandas, so I guess I\'m doing something wrong -
I have a DataFrame:
a b
0 0.5 0.75
1 0.5 0.75
2 0.5 0.75
3 0.5 0.75
np.correlate calculates the (unnormalized) cross-correlation between two 1-dimensional sequences:
z[k] = sum_n a[n] * conj(v[n+k])
while df.corr (by default) calculates the Pearson correlation coefficient.
The correlation coefficient (if it exists) is always between -1 and 1 inclusive. The cross-correlation is not bounded.
The formulas are somewhat related, but notice that in the cross-correlation formula (above) there is no subtraction of the means, and no division by the standard deviations which is part of the formula for Pearson correlation coefficient.
The fact that the standard deviation of df['a']
and df['b']
is zero is what causes df.corr
to be NaN everywhere.
From the comment below, it sounds like you are looking for Beta. It is related to Pearson's correlation coefficient, but instead of dividing by the product of standard deviations:
you divide by a variance:
You can compute Beta
using np.cov
cov = np.cov(a, b)
beta = cov[1, 0] / cov[0, 0]
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)
def geometric_brownian_motion(T=1, N=100, mu=0.1, sigma=0.01, S0=20):
"""
http://stackoverflow.com/a/13203189/190597 (unutbu)
"""
dt = float(T) / N
t = np.linspace(0, T, N)
W = np.random.standard_normal(size=N)
W = np.cumsum(W) * np.sqrt(dt) # standard brownian motion ###
X = (mu - 0.5 * sigma ** 2) * t + sigma * W
S = S0 * np.exp(X) # geometric brownian motion ###
return S
N = 10 ** 6
a = geometric_brownian_motion(T=1, mu=0.1, sigma=0.01, N=N)
b = geometric_brownian_motion(T=1, mu=0.2, sigma=0.01, N=N)
cov = np.cov(a, b)
print(cov)
# [[ 0.38234755 0.80525967]
# [ 0.80525967 1.73517501]]
beta = cov[1, 0] / cov[0, 0]
print(beta)
# 2.10609347015
plt.plot(a)
plt.plot(b)
plt.show()
The ratio of mu
s is 2, and beta
is ~2.1.
And you could also compute it with df.corr
, though this is a much more round-about way of doing it (but it is nice to see there is consistency):
import pandas as pd
df = pd.DataFrame({'a': a, 'b': b})
beta2 = (df.corr() * df['b'].std() * df['a'].std() / df['a'].var()).ix[0, 1]
print(beta2)
# 2.10609347015
assert np.allclose(beta, beta2)