Calculate time difference between Pandas Dataframe indices

前端未结

关注

 3  1611

I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).

time


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-11-27 12:05
              
            
            
                                                                       
Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas

Your original frame, with a datetime index

In [196]: df
Out[196]: 
                     value
2012-03-16 23:50:00      1
2012-03-16 23:56:00      2
2012-03-17 00:08:00      3
2012-03-17 00:10:00      4
2012-03-17 00:12:00      5
2012-03-17 00:20:00      6
2012-03-20 00:43:00      7

In [199]: df.index
Out[199]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
Length: 7, Freq: None, Timezone: None


Here is the timedelta64 of what you want

In [200]: df['tvalue'] = df.index

In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)

In [202]: df
Out[202]: 
                     value              tvalue            delta
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00


Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky

In [204]: df['ans'] = df['delta'].apply(lambda x: x  / np.timedelta64(1,'m')).astype('int64') % (24*60)

In [205]: df
Out[205]: 
                     value              tvalue            delta  ans
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00    0
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00    6
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00   12
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00    2
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00    2
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00    8
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00   23

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2020-11-27 12:17
              
            
            
                                                                       
We can create a series with both index and values equal to the index keys using to_series and then compute the differences between successive rows which would result in timedelta64[ns] dtype. After obtaining this, via the .dt property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).

In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
    ...: df                                 # use .astype(int) to obtain integer values
Out[13]: 
                     value  deltaT
time                              
2012-03-16 23:50:00      1     0.0
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7    23.0




simplification:

When we perform diff:

In [8]: ser_diff = df.index.to_series().diff()

In [9]: ser_diff
Out[9]: 
time
2012-03-16 23:50:00               NaT
2012-03-16 23:56:00   0 days 00:06:00
2012-03-17 00:08:00   0 days 00:12:00
2012-03-17 00:10:00   0 days 00:02:00
2012-03-17 00:12:00   0 days 00:02:00
2012-03-17 00:20:00   0 days 00:08:00
2012-03-20 00:43:00   3 days 00:23:00
Name: time, dtype: timedelta64[ns]


Seconds to minutes conversion:

In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
Out[10]: 
time
2012-03-16 23:50:00     0.0
2012-03-16 23:56:00     6.0
2012-03-17 00:08:00    12.0
2012-03-17 00:10:00     2.0
2012-03-17 00:12:00     2.0
2012-03-17 00:20:00     8.0
2012-03-20 00:43:00    23.0
Name: time, dtype: float64




If suppose you want to include even the date portion as it was excluded previously(only time portion was considered), dt.total_seconds would give you the elapsed duration in seconds with which minutes could then be calculated again by division.

In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
Out[12]: 
time
2012-03-16 23:50:00       0.0
2012-03-16 23:56:00       6.0
2012-03-17 00:08:00      12.0
2012-03-17 00:10:00       2.0
2012-03-17 00:12:00       2.0
2012-03-17 00:20:00       8.0
2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes
Name: time, dtype: float64

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-11-27 12:32
              
            
            
                                                                       

>= Numpy version 1.7.0.

Also can typecast df.index.to_series().diff() from timedelta64[ns](nano seconds- default dtype) to timedelta64[m](minutes) [Frequency conversion (astyping is equivalent of floor division)]
df['ΔT'] = df.index.to_series().diff().astype('timedelta64[m]')

                     value      ΔT
time                              
2012-03-16 23:50:00      1     NaN
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7  4343.0

(ΔT dtype: float64)
if you want to convert to int, fill na values with 0 before converting
>>> df.index.to_series().diff().fillna(0).astype('timedelta64[m]').astype('int')

time
2012-03-16 23:50:00       0
2012-03-16 23:56:00       6
2012-03-17 00:08:00      12
2012-03-17 00:10:00       2
2012-03-17 00:12:00       2
2012-03-17 00:20:00       8
2012-03-20 00:43:00    4343
Name: time, dtype: int64


Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units.
Below are the date units:
Y   year
M   month
W   week
D   day

below are the time units:
h   hour
m   minute
s   second
ms  millisecond
us  microsecond
ns  nanosecond
ps  picosecond
fs  femtosecond
as  attosecond


if you want difference upto decimals use true division, i.e., divide by np.timedelta64(1, 'm')

e.g. if df is as below,
                     value
time                      
2012-03-16 23:50:21      1
2012-03-16 23:56:28      2
2012-03-17 00:08:08      3
2012-03-17 00:10:56      4
2012-03-17 00:12:12      5
2012-03-17 00:20:00      6
2012-03-20 00:43:43      7


check the difference between asyping(floor division) and true division below.
>>> df.index.to_series().diff().astype('timedelta64[m]')
time
2012-03-16 23:50:21       NaN
2012-03-16 23:56:28       6.0
2012-03-17 00:08:08      11.0
2012-03-17 00:10:56       2.0
2012-03-17 00:12:12       1.0
2012-03-17 00:20:00       7.0
2012-03-20 00:43:43    4343.0
Name: time, dtype: float64

>>> df.index.to_series().diff()/np.timedelta64(1, 'm')
time
2012-03-16 23:50:21            NaN
2012-03-16 23:56:28       6.116667
2012-03-17 00:08:08      11.666667
2012-03-17 00:10:56       2.800000
2012-03-17 00:12:12       1.266667
2012-03-17 00:20:00       7.800000
2012-03-20 00:43:43    4343.716667
Name: time, dtype: float64



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复