Time Series Analysis - unevenly spaced measures - pandas + statsmodels

后端 未结 1 1043
闹比i
闹比i 2020-11-29 11:14

I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data.

I then tried this :

impor         


        
相关标签:
1条回答
  • 2020-11-29 11:35

    seasonal_decompose() requires a freq that is either provided as part of the DateTimeIndex meta information, can be inferred by pandas.Index.inferred_freq or else by the user as an integer that gives the number of periods per cycle. e.g., 12 for monthly (from docstring for seasonal_mean):

    def seasonal_decompose(x, model="additive", filt=None, freq=None):
        """
        Parameters
        ----------
        x : array-like
            Time series
        model : str {"additive", "multiplicative"}
            Type of seasonal component. Abbreviations are accepted.
        filt : array-like
            The filter coefficients for filtering out the seasonal component.
            The default is a symmetric moving average.
        freq : int, optional
            Frequency of the series. Must be used if x is not a pandas
            object with a timeseries index.
    

    To illustrate - using random sample data:

    length = 400
    x = np.sin(np.arange(length)) * 10 + np.random.randn(length)
    df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value'])
    
    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28
    Freq: W-SUN
    
    decomp = sm.tsa.seasonal_decompose(df)
    data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
    data.columns = ['series', 'trend', 'seasonal', 'resid']
    
    Data columns (total 4 columns):
    series      400 non-null float64
    trend       348 non-null float64
    seasonal    400 non-null float64
    resid       348 non-null float64
    dtypes: float64(4)
    memory usage: 15.6 KB
    

    So far, so good - now randomly dropping elements from the DateTimeIndex to create unevenly space data:

    df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))]
    
    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21
    Data columns (total 1 columns):
    value    222 non-null float64
    dtypes: float64(1)
    memory usage: 3.5 KB
    
    df.index.freq
    
    None
    
    df.index.inferred_freq
    
    None
    

    Running the seasonal_decomp on this data 'works':

    decomp = sm.tsa.seasonal_decompose(df, freq=52)
    
    data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
    data.columns = ['series', 'trend', 'seasonal', 'resid']
    
    DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07
    Data columns (total 4 columns):
    series      224 non-null float64
    trend       172 non-null float64
    seasonal    224 non-null float64
    resid       172 non-null float64
    dtypes: float64(4)
    memory usage: 8.8 KB
    

    The question is - how useful is the result. Even without gaps in the data that complicate inference of seasonal patterns (see example use of .interpolate() in the release notes, statsmodels qualifies this procedure as follows:

    Notes
    -----
    This is a naive decomposition. More sophisticated methods should
    be preferred.
    
    The additive model is Y[t] = T[t] + S[t] + e[t]
    
    The multiplicative model is Y[t] = T[t] * S[t] * e[t]
    
    The seasonal component is first removed by applying a convolution
    filter to the data. The average of this smoothed series for each
    period is the returned seasonal component.
    
    0 讨论(0)
提交回复
热议问题