I have various time series, that I want to correlate - or rather, cross-correlate - with each other, to find out at which time lag the correlation factor is the greatest.
<
There is a better approach: You can create a function that shifted your dataframe first before calling the corr().
Get this dataframe like an example:
d = {'prcp': [0.1,0.2,0.3,0.0], 'stp': [0.0,0.1,0.2,0.3]}
df = pd.DataFrame(data=d)
>>> df
prcp stp
0 0.1 0.0
1 0.2 0.1
2 0.3 0.2
3 0.0 0.3
Your function to shift others columns (except the target):
def df_shifted(df, target=None, lag=0):
if not lag and not target:
return df
new = {}
for c in df.columns:
if c == target:
new[c] = df[target]
else:
new[c] = df[c].shift(periods=lag)
return pd.DataFrame(data=new)
Supposing that your target is comparing the prcp (precipitation variable) with stp(atmospheric pressure)
If you do at the present will be:
>>> df.corr()
prcp stp
prcp 1.0 -0.2
stp -0.2 1.0
But if you shifted 1(one) period all other columns and keep the target (prcp):
df_new = df_shifted(df, 'prcp', lag=-1)
>>> print df_new
prcp stp
0 0.1 0.1
1 0.2 0.2
2 0.3 0.3
3 0.0 NaN
Note that now the column stp is shift one up position at period, so if you call the corr(), will be:
>>> df_new.corr()
prcp stp
prcp 1.0 1.0
stp 1.0 1.0
So, you can do with lag -1, -2, -n!!