Convert incomplete 12h datetime-like strings into appropriate datetime type

匆匆过客 提交于 2019-12-11 07:28:50

问题


I've got a pandas Series containing datetime-like strings with 12h format, but without the am/pm abbreviations. It covers an entire month of data :

40    01/01/2017 11:51:00
41    01/01/2017 11:51:05
42    01/01/2017 11:55:05
43    01/01/2017 11:55:10
44    01/01/2017 11:59:30
45    01/01/2017 11:59:35
46    02/01/2017 12:00:05
47    02/01/2017 12:00:10
48    02/01/2017 12:13:20
49    02/01/2017 12:13:25
50    02/01/2017 12:24:50
51    02/01/2017 12:24:55
52    02/01/2017 12:33:30
Name: TS, dtype: object
(318621,) # shape

My goal is to convert it to datetime format, so as to obtain the appropriate unix timestamps values, and make comparisions/arithmetics with other datetime data with, this time, 24h format. So I already tried this :

pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S') # %I for 12h format

Which outputs me :

64     2017-01-02 00:46:50
65     2017-01-02 00:46:55
66     2017-01-02 01:01:00
67     2017-01-02 01:01:05
68     2017-01-02 01:05:00

But the am/pm informations are not taken into account. I know that, as a rule, the am/pm first have to be specified in the strings, then one can use dt.dt.strptime() or pd.to_datetime() to parse them with the %p indicator.

So I wanted to know if there's an other way to deal with this issue through datetime or pandas datetime modules ? Or, do I have to manualy add the abbreviations 'am/pm' before the parsing ?


回答1:


You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):

31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM

First, we can parse the whole thing without AM/PM info, as you already showed:

ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')

We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:

ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')

Now we have times from 00:00:00 to 11:59:55, twice per day.

Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:

twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate

Next, build an offset series, which should be easy to inspect for correctness:

offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)

And finally:

ts += offset


来源:https://stackoverflow.com/questions/51018182/convert-incomplete-12h-datetime-like-strings-into-appropriate-datetime-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!