I can't find a reason why calculating the correlation between two series A and B using numpy.correlate
gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf
Here's an example of this difference I mention:
import numpy as np
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import ccf
#Calculate correlation using numpy.correlate
def corr(x,y):
result = numpy.correlate(x, y, mode='full')
return result[result.size/2:]
#This are the data series I want to analyze
A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)])
B = np.array([x for x in np.arange(-1,1.1,0.1)])
#Using numpy i get this
plt.plot(corr(B,A))
#Using statsmodels i get this
plt.plot(ccf(B,A,unbiased=False))
The results seem qualitatively different, where does this difference come from?
statsmodels.tsa.stattools.ccf
is based on np.correlate
but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.
For easier reference I copied the relevant lines below:
def ccovf(x, y, unbiased=True, demean=True):
n = len(x)
if demean:
xo = x - x.mean()
yo = y - y.mean()
else:
xo = x
yo = y
if unbiased:
xi = np.ones(n)
d = np.correlate(xi, xi, 'full')
else:
d = n
return (np.correlate(xo, yo, 'full') / d)[n - 1:]
def ccf(x, y, unbiased=True):
cvf = ccovf(x, y, unbiased=unbiased, demean=True)
return cvf / (np.std(x) * np.std(y))
来源:https://stackoverflow.com/questions/24616671/numpy-and-statsmodels-give-different-values-when-calculating-correlations-how-t