How to round a Pandas `DatetimeIndex`?

前端 未结 4 1318
栀梦
栀梦 2021-01-11 10:04

I have a pandas.DatetimeIndex, e.g.:

pd.date_range(\'2012-1-1 02:03:04.000\',periods=3,freq=\'1ms\')
>>> [2012-01-01 02:03:04, ..., 201         


        
相关标签:
4条回答
  • 2021-01-11 10:42

    round() method was added for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta in pandas 0.18.0. Now we can do the following:

    In[114]: index = pd.DatetimeIndex([pd.Timestamp('2012-01-01 02:03:04.000'), pd.Timestamp('2012-01-01 02:03:04.002'), pd.Timestamp('20130712 02:03:04.500'), pd.Timestamp('2012-01-01 02:03:04.501')])
    
    In[115]: index.values
    Out[115]: 
    array(['2012-01-01T02:03:04.000000000', '2012-01-01T02:03:04.002000000',
           '2013-07-12T02:03:04.500000000', '2012-01-01T02:03:04.501000000'], dtype='datetime64[ns]')
    
    In[116]: index.round('S')
    Out[116]: 
    DatetimeIndex(['2012-01-01 02:03:04', '2012-01-01 02:03:04',
                   '2013-07-12 02:03:04', '2012-01-01 02:03:05'],
                  dtype='datetime64[ns]', freq=None)
    

    round() accepts frequency parameter. String aliases for it are listed here.

    0 讨论(0)
  • 2021-01-11 10:42

    There is little point in changing the index itself - since you can just generate using date_range with the desired frequency parameter as in your question.

    I assume what you are trying to do is change the frequency of a Time Series that contains data, in which case you can use resample (documentation). For example if you have the following time series:

    dt_index = pd.date_range('2012-1-1 00:00.001',periods=3, freq='1ms')
    ts = pd.Series(randn(3), index=dt_index)
    
    
    2012-01-01 00:00:00           0.594618
    2012-01-01 00:00:00.001000    0.874552
    2012-01-01 00:00:00.002000   -0.700076
    Freq: L
    

    Then you can change the frequency to seconds using resample, specifying how you want to aggregate the values (mean, sum etc.):

    ts.resample('S', how='sum')
    
    2012-01-01 00:00:00    0.594618
    2012-01-01 00:00:01    0.174475
    Freq: S
    
    0 讨论(0)
  • 2021-01-11 10:57

    Update: if you're doing this to a DatetimeIndex / datetime64 column a better way is to use np.round directly rather than via an apply/map:

    np.round(dtindex_or_datetime_col.astype(np.int64), -9).astype('datetime64[ns]')
    

    Old answer (with some more explanation):

    Whilst @Matti's answer is clearly the correct way to deal with your situation, I thought I would add an answer how you might round a Timestamp to the nearest second:

    from pandas.lib import Timestamp
    
    t1 = Timestamp('2012-1-1 00:00:00')
    t2 = Timestamp('2012-1-1 00:00:00.000333')
    
    In [4]: t1
    Out[4]: <Timestamp: 2012-01-01 00:00:00>
    
    In [5]: t2
    Out[5]: <Timestamp: 2012-01-01 00:00:00.000333>
    
    In [6]: t2.microsecond
    Out[6]: 333
    
    In [7]: t1.value
    Out[7]: 1325376000000000000L
    
    In [8]: t2.value
    Out[8]: 1325376000000333000L
    
    # Alternatively: t2.value - t2.value % 1000000000
    In [9]: long(round(t2.value, -9)) # round milli-, micro- and nano-seconds
    Out[9]: 1325376000000000000L
    
    In [10]: Timestamp(long(round(t2.value, -9)))
    Out[10]: <Timestamp: 2012-01-01 00:00:00>
    

    Hence you can apply this to the entire index:

    def to_the_second(ts):
        return Timestamp(long(round(ts.value, -9)))
    
    dtindex.map(to_the_second)
    
    0 讨论(0)
  • 2021-01-11 11:08

    For more general rounding, you can make use of the fact that Pandas Timestamp objects mostly use the standard library datetime.datetime API, including the datetime.datetime.replace() method.

    So, to solve your microsecond rounding problem, you could do:

    import datetime
    import pandas as pd
    
    times = pd.date_range('2012-1-1 02:03:04.499',periods=3,freq='1ms')
    # Add 5e5 microseconds and truncate to simulate rounding
    times_rounded = [(x + datetime.timedelta(microseconds=5e5)).replace(microsecond=0) for x in times]
    
    from IPython.display import display
    print('Before:')
    display(list(times))
    print('After:')
    display(list(times_rounded))
    

    Output:

    Before:
    [Timestamp('2012-01-01 02:03:04.499000', offset='L'),
     Timestamp('2012-01-01 02:03:04.500000', offset='L'),
     Timestamp('2012-01-01 02:03:04.501000', offset='L')]
    After:
    [Timestamp('2012-01-01 02:03:04', offset='L'),
     Timestamp('2012-01-01 02:03:05', offset='L'),
     Timestamp('2012-01-01 02:03:05', offset='L')]
    

    You can use the same technique to, e.g., round to the nearest day (as long as you're not concerned about leap seconds and the like):

    times = pd.date_range('2012-1-1 08:00:00', periods=3, freq='4H')
    times_rounded = [(x + datetime.timedelta(hours=12)).replace(hour=0, second=0, microsecond=0) for x in times]
    

    Inspired by this SO post: https://stackoverflow.com/a/19718411/1410871

    0 讨论(0)
提交回复
热议问题