Cross-correlation (time-lag-correlation) with pandas?

后端 未结 3 1240
囚心锁ツ
囚心锁ツ 2021-01-30 01:49

I have various time series, that I want to correlate - or rather, cross-correlate - with each other, to find out at which time lag the correlation factor is the greatest.

<
3条回答
  •  执念已碎
    2021-01-30 02:07

    There is a better approach: You can create a function that shifted your dataframe first before calling the corr().

    Get this dataframe like an example:

    d = {'prcp': [0.1,0.2,0.3,0.0], 'stp': [0.0,0.1,0.2,0.3]}
    df = pd.DataFrame(data=d)
    
    >>> df
       prcp  stp
    0   0.1  0.0
    1   0.2  0.1
    2   0.3  0.2
    3   0.0  0.3
    

    Your function to shift others columns (except the target):

    def df_shifted(df, target=None, lag=0):
        if not lag and not target:
            return df       
        new = {}
        for c in df.columns:
            if c == target:
                new[c] = df[target]
            else:
                new[c] = df[c].shift(periods=lag)
        return  pd.DataFrame(data=new)
    

    Supposing that your target is comparing the prcp (precipitation variable) with stp(atmospheric pressure)

    If you do at the present will be:

    >>> df.corr()
          prcp  stp
    prcp   1.0 -0.2
    stp   -0.2  1.0
    

    But if you shifted 1(one) period all other columns and keep the target (prcp):

    df_new = df_shifted(df, 'prcp', lag=-1)
    
    >>> print df_new
       prcp  stp
    0   0.1  0.1
    1   0.2  0.2
    2   0.3  0.3
    3   0.0  NaN
    

    Note that now the column stp is shift one up position at period, so if you call the corr(), will be:

    >>> df_new.corr()
          prcp  stp
    prcp   1.0  1.0
    stp    1.0  1.0
    

    So, you can do with lag -1, -2, -n!!

提交回复
热议问题