computing the mean for python datetime

前端 未结 5 572
梦谈多话
梦谈多话 2020-12-21 00:32

I have a datetime attribute:

d = {
    \'DOB\': pd.Series([
        datetime.datetime(2014, 7, 9),
        datetime.datetime(2014, 7, 15),
        np.datetim         


        
相关标签:
5条回答
  • 2020-12-21 01:04

    You can convert epoch time using astype with np.int64 and converting back to datetime with pd.to_datetime:

    pd.to_datetime(df_test.DOB.dropna().astype(np.int64).mean())
    

    Output:

    Timestamp('2014-07-12 00:00:00')
    
    0 讨论(0)
  • 2020-12-21 01:15

    Datetime math supports some standard operations:

    a = datetime.datetime(2014, 7, 9)
    b = datetime.datetime(2014, 7, 15)
    c = (b - a)/2
    
    # here c will be datetime.timedelta(3)
    
    a + c
    Out[7]: datetime.datetime(2014, 7, 12, 0, 0)
    

    So you can write a function that, given two datetimes, subtracts the lesser form the greater and adds half of the difference to the lesser. Apply this function to your dataframe, and shazam!

    0 讨论(0)
  • 2020-12-21 01:16

    As of pandas=0.25, it is possible to compute the mean of a datetime series.

    In [1]: import pandas as pd
       ...: import numpy as np
    
    In [2]: s = pd.Series([
       ...:     pd.datetime(2014, 7, 9),
       ...:     pd.datetime(2014, 7, 15),
       ...:     np.datetime64('NaT')])
    
    In [3]: s.mean()
    Out[3]: Timestamp('2014-07-12 00:00:00')
    

    However, note that applying mean to a pandas dataframe currently ignores columns with a datetime series.

    0 讨论(0)
  • 2020-12-21 01:23

    You can take the mean of Timedelta. So find the minimum value and subtract it from the series to get a series of Timedelta. Then take the mean and add it back to the minimum.

    dob = df_test.DOB
    m = dob.min()
    (m + (dob - m).mean()).to_pydatetime()
    
    datetime.datetime(2014, 7, 12, 0, 0)
    

    One-line

    df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(d.min())).to_pydatetime()
    

    To @ALollz point

    I use the epoch pd.Timestamp(0) instead of min

    df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(pd.Timestamp(0))).to_pydatetime()
    
    0 讨论(0)
  • 2020-12-21 01:23

    You could work with unix time if you want. This is defined as the total number of seconds (for instance) since 1970-01-01. With that, all of your times are simply floats, so it's very easy to do simple math on the columns.

    import pandas as pd
    
    df_test['unix_time'] = (df_test.DOB - pd.to_datetime('1970-01-01')).dt.total_seconds()
    
    df_test['unix_time'].mean()
    #1405123200.0
    
    # You want it in date, so just convert back
    pd.to_datetime(df_test['unix_time'].mean(), origin='unix', unit='s')
    #Timestamp('2014-07-12 00:00:00')
    
    0 讨论(0)
提交回复
热议问题