computing the mean for python datetime

别说谁变了你拦得住时间么 提交于 2019-12-29 08:41:33

问题


I have a datetime attribute:

d = {
    'DOB': pd.Series([
        datetime.datetime(2014, 7, 9),
        datetime.datetime(2014, 7, 15),
        np.datetime64('NaT')
    ], index=['a', 'b', 'c'])
}
df_test = pd.DataFrame(d)

I would like to compute the mean for that attribute. Running mean() causes an error:

TypeError: reduction operation 'mean' not allowed for this dtype

I also tried the solution proposed elsewhere. It doesn't work as running the function proposed there causes

OverflowError: Python int too large to convert to C long

What would you propose? The result for the above dataframe should be equivalent to

datetime.datetime(2014, 7, 12).

回答1:


You can take the mean of Timedelta. So find the minimum value and subtract it from the series to get a series of Timedelta. Then take the mean and add it back to the minimum.

dob = df_test.DOB
m = dob.min()
(m + (dob - m).mean()).to_pydatetime()

datetime.datetime(2014, 7, 12, 0, 0)

One-line

df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(d.min())).to_pydatetime()

To @ALollz point

I use the epoch pd.Timestamp(0) instead of min

df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(pd.Timestamp(0))).to_pydatetime()



回答2:


You can convert epoch time using astype with np.int64 and converting back to datetime with pd.to_datetime:

pd.to_datetime(df_test.DOB.dropna().astype(np.int64).mean())

Output:

Timestamp('2014-07-12 00:00:00')



回答3:


You could work with unix time if you want. This is defined as the total number of seconds (for instance) since 1970-01-01. With that, all of your times are simply floats, so it's very easy to do simple math on the columns.

import pandas as pd

df_test['unix_time'] = (df_test.DOB - pd.to_datetime('1970-01-01')).dt.total_seconds()

df_test['unix_time'].mean()
#1405123200.0

# You want it in date, so just convert back
pd.to_datetime(df_test['unix_time'].mean(), origin='unix', unit='s')
#Timestamp('2014-07-12 00:00:00')



回答4:


Datetime math supports some standard operations:

a = datetime.datetime(2014, 7, 9)
b = datetime.datetime(2014, 7, 15)
c = (b - a)/2

# here c will be datetime.timedelta(3)

a + c
Out[7]: datetime.datetime(2014, 7, 12, 0, 0)

So you can write a function that, given two datetimes, subtracts the lesser form the greater and adds half of the difference to the lesser. Apply this function to your dataframe, and shazam!




回答5:


As of pandas=0.25, it is possible to compute the mean of a datetime series.

In [1]: import pandas as pd
   ...: import numpy as np

In [2]: s = pd.Series([
   ...:     pd.datetime(2014, 7, 9),
   ...:     pd.datetime(2014, 7, 15),
   ...:     np.datetime64('NaT')])

In [3]: s.mean()
Out[3]: Timestamp('2014-07-12 00:00:00')

However, note that applying mean to a pandas dataframe currently ignores columns with a datetime series.



来源:https://stackoverflow.com/questions/50358564/computing-the-mean-for-python-datetime

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!