Calculate time difference between Pandas Dataframe indices

前端 未结 3 1611
执念已碎
执念已碎 2020-11-27 11:46

I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).

time             


        
相关标签:
3条回答
  • 2020-11-27 12:05

    Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas

    Your original frame, with a datetime index

    In [196]: df
    Out[196]: 
                         value
    2012-03-16 23:50:00      1
    2012-03-16 23:56:00      2
    2012-03-17 00:08:00      3
    2012-03-17 00:10:00      4
    2012-03-17 00:12:00      5
    2012-03-17 00:20:00      6
    2012-03-20 00:43:00      7
    
    In [199]: df.index
    Out[199]: 
    <class 'pandas.tseries.index.DatetimeIndex'>
    [2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
    Length: 7, Freq: None, Timezone: None
    

    Here is the timedelta64 of what you want

    In [200]: df['tvalue'] = df.index
    
    In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
    
    In [202]: df
    Out[202]: 
                         value              tvalue            delta
    2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00
    2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00
    2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00
    2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00
    2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00
    2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00
    2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00
    

    Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky

    In [204]: df['ans'] = df['delta'].apply(lambda x: x  / np.timedelta64(1,'m')).astype('int64') % (24*60)
    
    In [205]: df
    Out[205]: 
                         value              tvalue            delta  ans
    2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00    0
    2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00    6
    2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00   12
    2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00    2
    2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00    2
    2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00    8
    2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00   23
    
    0 讨论(0)
  • 2020-11-27 12:17

    We can create a series with both index and values equal to the index keys using to_series and then compute the differences between successive rows which would result in timedelta64[ns] dtype. After obtaining this, via the .dt property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).

    In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
        ...: df                                 # use .astype(int) to obtain integer values
    Out[13]: 
                         value  deltaT
    time                              
    2012-03-16 23:50:00      1     0.0
    2012-03-16 23:56:00      2     6.0
    2012-03-17 00:08:00      3    12.0
    2012-03-17 00:10:00      4     2.0
    2012-03-17 00:12:00      5     2.0
    2012-03-17 00:20:00      6     8.0
    2012-03-20 00:43:00      7    23.0
    

    simplification:

    When we perform diff:

    In [8]: ser_diff = df.index.to_series().diff()
    
    In [9]: ser_diff
    Out[9]: 
    time
    2012-03-16 23:50:00               NaT
    2012-03-16 23:56:00   0 days 00:06:00
    2012-03-17 00:08:00   0 days 00:12:00
    2012-03-17 00:10:00   0 days 00:02:00
    2012-03-17 00:12:00   0 days 00:02:00
    2012-03-17 00:20:00   0 days 00:08:00
    2012-03-20 00:43:00   3 days 00:23:00
    Name: time, dtype: timedelta64[ns]
    

    Seconds to minutes conversion:

    In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
    Out[10]: 
    time
    2012-03-16 23:50:00     0.0
    2012-03-16 23:56:00     6.0
    2012-03-17 00:08:00    12.0
    2012-03-17 00:10:00     2.0
    2012-03-17 00:12:00     2.0
    2012-03-17 00:20:00     8.0
    2012-03-20 00:43:00    23.0
    Name: time, dtype: float64
    

    If suppose you want to include even the date portion as it was excluded previously(only time portion was considered), dt.total_seconds would give you the elapsed duration in seconds with which minutes could then be calculated again by division.

    In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
    Out[12]: 
    time
    2012-03-16 23:50:00       0.0
    2012-03-16 23:56:00       6.0
    2012-03-17 00:08:00      12.0
    2012-03-17 00:10:00       2.0
    2012-03-17 00:12:00       2.0
    2012-03-17 00:20:00       8.0
    2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes
    Name: time, dtype: float64
    
    0 讨论(0)
  • 2020-11-27 12:32

    >= Numpy version 1.7.0.

    Also can typecast df.index.to_series().diff() from timedelta64[ns](nano seconds- default dtype) to timedelta64[m](minutes) [Frequency conversion (astyping is equivalent of floor division)]

    df['ΔT'] = df.index.to_series().diff().astype('timedelta64[m]')
    
                         value      ΔT
    time                              
    2012-03-16 23:50:00      1     NaN
    2012-03-16 23:56:00      2     6.0
    2012-03-17 00:08:00      3    12.0
    2012-03-17 00:10:00      4     2.0
    2012-03-17 00:12:00      5     2.0
    2012-03-17 00:20:00      6     8.0
    2012-03-20 00:43:00      7  4343.0
    

    (ΔT dtype: float64)

    if you want to convert to int, fill na values with 0 before converting

    >>> df.index.to_series().diff().fillna(0).astype('timedelta64[m]').astype('int')
    
    time
    2012-03-16 23:50:00       0
    2012-03-16 23:56:00       6
    2012-03-17 00:08:00      12
    2012-03-17 00:10:00       2
    2012-03-17 00:12:00       2
    2012-03-17 00:20:00       8
    2012-03-20 00:43:00    4343
    Name: time, dtype: int64
    
    

    Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units.

    Below are the date units:

    Y   year
    M   month
    W   week
    D   day
    

    below are the time units:

    h   hour
    m   minute
    s   second
    ms  millisecond
    us  microsecond
    ns  nanosecond
    ps  picosecond
    fs  femtosecond
    as  attosecond
    

    if you want difference upto decimals use true division, i.e., divide by np.timedelta64(1, 'm')
    e.g. if df is as below,

                         value
    time                      
    2012-03-16 23:50:21      1
    2012-03-16 23:56:28      2
    2012-03-17 00:08:08      3
    2012-03-17 00:10:56      4
    2012-03-17 00:12:12      5
    2012-03-17 00:20:00      6
    2012-03-20 00:43:43      7
    
    

    check the difference between asyping(floor division) and true division below.

    >>> df.index.to_series().diff().astype('timedelta64[m]')
    time
    2012-03-16 23:50:21       NaN
    2012-03-16 23:56:28       6.0
    2012-03-17 00:08:08      11.0
    2012-03-17 00:10:56       2.0
    2012-03-17 00:12:12       1.0
    2012-03-17 00:20:00       7.0
    2012-03-20 00:43:43    4343.0
    Name: time, dtype: float64
    
    >>> df.index.to_series().diff()/np.timedelta64(1, 'm')
    time
    2012-03-16 23:50:21            NaN
    2012-03-16 23:56:28       6.116667
    2012-03-17 00:08:08      11.666667
    2012-03-17 00:10:56       2.800000
    2012-03-17 00:12:12       1.266667
    2012-03-17 00:20:00       7.800000
    2012-03-20 00:43:43    4343.716667
    Name: time, dtype: float64
    
    
    
    0 讨论(0)
提交回复
热议问题