Pandas Rolling Computations on Sliding Windows (Unevenly spaced)

后端 未结 4 1614
北恋
北恋 2020-12-03 06:02

Consider you\'ve got some unevenly time series data:

import pandas as pd
import random as randy
ts = pd.Series(range(1000),index=randy.sample(pd.date_range(\         


        
相关标签:
4条回答
  • 2020-12-03 06:08

    Perhaps it makes more sense to use rolling_sum:

    pd.rolling_sum(ts, window=1, freq='1ms')
    
    0 讨论(0)
  • 2020-12-03 06:09

    How about something like this:

    Create an offset for 1 ms:

    In [1]: ms = tseries.offsets.Milli()
    

    Create a series of index positions the same length as your timeseries:

    In [2]: s = Series(range(len(ts)))
    

    Apply a lambda function that indexes the current time from the ts series. The function returns the sum of all ts entries between x - ms and x.

    In [3]: s.apply(lambda x: ts.between_time(start_time=ts.index[x]-ms, end_time=ts.index[x]).sum())
    
    In [4]: ts.head()
    Out[4]:
    2013-02-01 09:00:00.000558    348
    2013-02-01 09:00:00.000647    361
    2013-02-01 09:00:00.000726    312
    2013-02-01 09:00:00.001012    550
    2013-02-01 09:00:00.002208    758
    

    Results of the above function:

    0     348
    1     709
    2    1021
    3    1571
    4     758
    
    0 讨论(0)
  • 2020-12-03 06:14

    You can solve most problems of this sort with cumsum and binary search.

    from datetime import timedelta
    
    def msum(s, lag_in_ms):
        lag = s.index - timedelta(milliseconds=lag_in_ms)
        inds = np.searchsorted(s.index.astype(np.int64), lag.astype(np.int64))
        cs = s.cumsum()
        return pd.Series(cs.values - cs[inds].values + s[inds].values, index=s.index)
    
    res = msum(ts, 100)
    print pd.DataFrame({'a': ts, 'a_msum_100': res})
    
    
                                a  a_msum_100
    2013-02-01 09:00:00.073479  5           5
    2013-02-01 09:00:00.083717  8          13
    2013-02-01 09:00:00.162707  1          14
    2013-02-01 09:00:00.171809  6          20
    2013-02-01 09:00:00.240111  7          14
    2013-02-01 09:00:00.258455  0          14
    2013-02-01 09:00:00.336564  2           9
    2013-02-01 09:00:00.536416  3           3
    2013-02-01 09:00:00.632439  4           7
    2013-02-01 09:00:00.789746  9           9
    
    [10 rows x 2 columns]
    

    You need a way of handling NaNs and depending on your application, you may need the prevailing value asof the lagged time or not (ie difference between using kdb+ bin vs np.searchsorted).

    Hope this helps.

    0 讨论(0)
  • 2020-12-03 06:18

    This is an old question, but for those who stumble upon this from google: in pandas 0.19 this is built-in as the function

    http://pandas.pydata.org/pandas-docs/stable/computation.html#time-aware-rolling

    So to get 1 ms windows it looks like you get a Rolling object by doing

    dft.rolling('1ms')
    

    and the sum would be

    dft.rolling('1ms').sum()
    
    0 讨论(0)
提交回复
热议问题