Cross-correlation (time-lag-correlation) with pandas?

后端 未结 3 1229
囚心锁ツ
囚心锁ツ 2021-01-30 01:49

I have various time series, that I want to correlate - or rather, cross-correlate - with each other, to find out at which time lag the correlation factor is the greatest.

<
相关标签:
3条回答
  • 2021-01-30 02:07

    There is a better approach: You can create a function that shifted your dataframe first before calling the corr().

    Get this dataframe like an example:

    d = {'prcp': [0.1,0.2,0.3,0.0], 'stp': [0.0,0.1,0.2,0.3]}
    df = pd.DataFrame(data=d)
    
    >>> df
       prcp  stp
    0   0.1  0.0
    1   0.2  0.1
    2   0.3  0.2
    3   0.0  0.3
    

    Your function to shift others columns (except the target):

    def df_shifted(df, target=None, lag=0):
        if not lag and not target:
            return df       
        new = {}
        for c in df.columns:
            if c == target:
                new[c] = df[target]
            else:
                new[c] = df[c].shift(periods=lag)
        return  pd.DataFrame(data=new)
    

    Supposing that your target is comparing the prcp (precipitation variable) with stp(atmospheric pressure)

    If you do at the present will be:

    >>> df.corr()
          prcp  stp
    prcp   1.0 -0.2
    stp   -0.2  1.0
    

    But if you shifted 1(one) period all other columns and keep the target (prcp):

    df_new = df_shifted(df, 'prcp', lag=-1)
    
    >>> print df_new
       prcp  stp
    0   0.1  0.1
    1   0.2  0.2
    2   0.3  0.3
    3   0.0  NaN
    

    Note that now the column stp is shift one up position at period, so if you call the corr(), will be:

    >>> df_new.corr()
          prcp  stp
    prcp   1.0  1.0
    stp    1.0  1.0
    

    So, you can do with lag -1, -2, -n!!

    0 讨论(0)
  • 2021-01-30 02:21

    To build up on Andre's answer - if you only care about (lagged) correlation to the target, but want to test various lags (e.g. to see which lag gives the highest correlations), you can do something like this:

    lagged_correlation = pd.DataFrame.from_dict(
        {x: [df[target].corr(df[x].shift(-t)) for t in range(max_lag)] for x in df.columns})
    

    This way, each row corresponds to a different lag value, and each column corresponds to a different variable (one of them is the target itself, giving the autocorrelation).

    0 讨论(0)
  • 2021-01-30 02:30

    As far as I can tell, there isn't a built in method that does exactly what you are asking. But if you look at the source code for the pandas Series method autocorr, you can see you've got the right idea:

    def autocorr(self, lag=1):
        """
        Lag-N autocorrelation
    
        Parameters
        ----------
        lag : int, default 1
            Number of lags to apply before performing autocorrelation.
    
        Returns
        -------
        autocorr : float
        """
        return self.corr(self.shift(lag))
    

    So a simple timelagged cross covariance function would be

    def crosscorr(datax, datay, lag=0):
        """ Lag-N cross correlation. 
        Parameters
        ----------
        lag : int, default 0
        datax, datay : pandas.Series objects of equal length
    
        Returns
        ----------
        crosscorr : float
        """
        return datax.corr(datay.shift(lag))
    

    Then if you wanted to look at the cross correlations at each month, you could do

     xcov_monthly = [crosscorr(datax, datay, lag=i) for i in range(12)]
    
    0 讨论(0)
提交回复
热议问题