I have a large set of time series (> 500), I\'d like to select only the ones that are periodic. I did a bit of literature research and I found out that I should look for autocor
I would use mode='same' instead of mode='full' because with mode='full' we get covariances for extreme shifts, where just 1 array element overlaps self, the rest being zeros. Those are not going to be interesting. With mode='same' at least half of the shifted array overlaps the original one.
Also, to have the true correlation coefficient (r) you need to divide by the size of the overlap, not by the size of the original x. (in my code these are np.arange(n-1, n//2, -1)
). Then each of the outputs will be between -1 and 1.
A glance at Durbin–Watson statistic, which is similar to 2(1-r), suggests that people consider its values below 1 to be a significant indication of autocorrelation, which corresponds to r > 0.5. So this is what I use below. For a statistically sound treatment of the significance of autocorrelation refer to statistics literature; a starting point would be to have a model for your time series.
def autocorr(x):
n = x.size
norm = (x - np.mean(x))
result = np.correlate(norm, norm, mode='same')
acorr = result[n//2 + 1:] / (x.var() * np.arange(n-1, n//2, -1))
lag = np.abs(acorr).argmax() + 1
r = acorr[lag-1]
if np.abs(r) > 0.5:
print('Appears to be autocorrelated with r = {}, lag = {}'. format(r, lag))
else:
print('Appears to be not autocorrelated')
return r, lag
Output for your two toy examples:
Appears to be not autocorrelated
Appears to be autocorrelated with r = 1.0, lag = 4