How do I convert a numpy.datetime64
object to a datetime.datetime
(or Timestamp
)?
In the following code, I create a datetime,
You can just use the pd.Timestamp constructor. The following diagram may be useful for this and related questions.
One option is to use str
, and then to_datetime
(or similar):
In [11]: str(dt64)
Out[11]: '2012-05-01T01:00:00.000000+0100'
In [12]: pd.to_datetime(str(dt64))
Out[12]: datetime.datetime(2012, 5, 1, 1, 0, tzinfo=tzoffset(None, 3600))
Note: it is not equal to dt
because it's become "offset-aware":
In [13]: pd.to_datetime(str(dt64)).replace(tzinfo=None)
Out[13]: datetime.datetime(2012, 5, 1, 1, 0)
This seems inelegant.
.
Update: this can deal with the "nasty example":
In [21]: dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100')
In [22]: pd.to_datetime(str(dt64)).replace(tzinfo=None)
Out[22]: datetime.datetime(2002, 6, 28, 1, 0)
Welcome to hell.
You can just pass a datetime64 object to pandas.Timestamp
:
In [16]: Timestamp(numpy.datetime64('2012-05-01T01:00:00.000000'))
Out[16]: <Timestamp: 2012-05-01 01:00:00>
I noticed that this doesn't work right though in NumPy 1.6.1:
numpy.datetime64('2012-05-01T01:00:00.000000+0100')
Also, pandas.to_datetime
can be used (this is off of the dev version, haven't checked v0.9.1):
In [24]: pandas.to_datetime('2012-05-01T01:00:00.000000+0100')
Out[24]: datetime.datetime(2012, 5, 1, 1, 0, tzinfo=tzoffset(None, 3600))
I think there could be a more consolidated effort in an answer to better explain the relationship between Python's datetime module, numpy's datetime64/timedelta64 and pandas' Timestamp/Timedelta objects.
The datetime standard library has four main objects
>>> import datetime
>>> datetime.time(hour=4, minute=3, second=10, microsecond=7199)
datetime.time(4, 3, 10, 7199)
>>> datetime.date(year=2017, month=10, day=24)
datetime.date(2017, 10, 24)
>>> datetime.datetime(year=2017, month=10, day=24, hour=4, minute=3, second=10, microsecond=7199)
datetime.datetime(2017, 10, 24, 4, 3, 10, 7199)
>>> datetime.timedelta(days=3, minutes = 55)
datetime.timedelta(3, 3300)
>>> # add timedelta to datetime
>>> datetime.timedelta(days=3, minutes = 55) + \
datetime.datetime(year=2017, month=10, day=24, hour=4, minute=3, second=10, microsecond=7199)
datetime.datetime(2017, 10, 27, 4, 58, 10, 7199)
NumPy has no separate date and time objects, just a single datetime64 object to represent a single moment in time. The datetime module's datetime object has microsecond precision (one-millionth of a second). NumPy's datetime64 object allows you to set its precision from hours all the way to attoseconds (10 ^ -18). It's constructor is more flexible and can take a variety of inputs.
Pass an integer with a string for the units. See all units here. It gets converted to that many units after the UNIX epoch: Jan 1, 1970
>>> np.datetime64(5, 'ns')
numpy.datetime64('1970-01-01T00:00:00.000000005')
>>> np.datetime64(1508887504, 's')
numpy.datetime64('2017-10-24T23:25:04')
You can also use strings as long as they are in ISO 8601 format.
>>> np.datetime64('2017-10-24')
numpy.datetime64('2017-10-24')
Timedeltas have a single unit
>>> np.timedelta64(5, 'D') # 5 days
>>> np.timedelta64(10, 'h') 10 hours
Can also create them by subtracting two datetime64 objects
>>> np.datetime64('2017-10-24T05:30:45.67') - np.datetime64('2017-10-22T12:35:40.123')
numpy.timedelta64(147305547,'ms')
A pandas Timestamp is a moment in time very similar to a datetime but with much more functionality. You can construct them with either pd.Timestamp
or pd.to_datetime
.
>>> pd.Timestamp(1239.1238934) #defautls to nanoseconds
Timestamp('1970-01-01 00:00:00.000001239')
>>> pd.Timestamp(1239.1238934, unit='D') # change units
Timestamp('1973-05-24 02:58:24.355200')
>>> pd.Timestamp('2017-10-24 05') # partial strings work
Timestamp('2017-10-24 05:00:00')
pd.to_datetime
works very similarly (with a few more options) and can convert a list of strings into Timestamps.
>>> pd.to_datetime('2017-10-24 05')
Timestamp('2017-10-24 05:00:00')
>>> pd.to_datetime(['2017-1-1', '2017-1-2'])
DatetimeIndex(['2017-01-01', '2017-01-02'], dtype='datetime64[ns]', freq=None)
>>> dt = datetime.datetime(year=2017, month=10, day=24, hour=4,
minute=3, second=10, microsecond=7199)
>>> np.datetime64(dt)
numpy.datetime64('2017-10-24T04:03:10.007199')
>>> pd.Timestamp(dt) # or pd.to_datetime(dt)
Timestamp('2017-10-24 04:03:10.007199')
>>> dt64 = np.datetime64('2017-10-24 05:34:20.123456')
>>> unix_epoch = np.datetime64(0, 's')
>>> one_second = np.timedelta64(1, 's')
>>> seconds_since_epoch = (dt64 - unix_epoch) / one_second
>>> seconds_since_epoch
1508823260.123456
>>> datetime.datetime.utcfromtimestamp(seconds_since_epoch)
>>> datetime.datetime(2017, 10, 24, 5, 34, 20, 123456)
Convert to Timestamp
>>> pd.Timestamp(dt64)
Timestamp('2017-10-24 05:34:20.123456')
This is quite easy as pandas timestamps are very powerful
>>> ts = pd.Timestamp('2017-10-24 04:24:33.654321')
>>> ts.to_pydatetime() # Python's datetime
datetime.datetime(2017, 10, 24, 4, 24, 33, 654321)
>>> ts.to_datetime64()
numpy.datetime64('2017-10-24T04:24:33.654321000')
I've come back to this answer more times than I can count, so I decided to throw together a quick little class, which converts a Numpy datetime64
value to Python datetime
value. I hope it helps others out there.
from datetime import datetime
import pandas as pd
class NumpyConverter(object):
@classmethod
def to_datetime(cls, dt64, tzinfo=None):
"""
Converts a Numpy datetime64 to a Python datetime.
:param dt64: A Numpy datetime64 variable
:type dt64: numpy.datetime64
:param tzinfo: The timezone the date / time value is in
:type tzinfo: pytz.timezone
:return: A Python datetime variable
:rtype: datetime
"""
ts = pd.to_datetime(dt64)
if tzinfo is not None:
return datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second, tzinfo=tzinfo)
return datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second)
I'm gonna keep this in my tool bag, something tells me I'll need it again.
To convert numpy.datetime64
to datetime object that represents time in UTC on numpy-1.8
:
>>> from datetime import datetime
>>> import numpy as np
>>> dt = datetime.utcnow()
>>> dt
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> dt64 = np.datetime64(dt)
>>> ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
>>> ts
1354650685.3624549
>>> datetime.utcfromtimestamp(ts)
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> np.__version__
'1.8.0.dev-7b75899'
The above example assumes that a naive datetime object is interpreted by np.datetime64
as time in UTC.
To convert datetime to np.datetime64 and back (numpy-1.6
):
>>> np.datetime64(datetime.utcnow()).astype(datetime)
datetime.datetime(2012, 12, 4, 13, 34, 52, 827542)
It works both on a single np.datetime64 object and a numpy array of np.datetime64.
Think of np.datetime64 the same way you would about np.int8, np.int16, etc and apply the same methods to convert beetween Python objects such as int, datetime and corresponding numpy objects.
Your "nasty example" works correctly:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
datetime.datetime(2002, 6, 28, 0, 0)
>>> numpy.__version__
'1.6.2' # current version available via pip install numpy
I can reproduce the long
value on numpy-1.8.0
installed as:
pip install git+https://github.com/numpy/numpy.git#egg=numpy-dev
The same example:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
1025222400000000000L
>>> numpy.__version__
'1.8.0.dev-7b75899'
It returns long
because for numpy.datetime64
type .astype(datetime)
is equivalent to .astype(object)
that returns Python integer (long
) on numpy-1.8
.
To get datetime object you could:
>>> dt64.dtype
dtype('<M8[ns]')
>>> ns = 1e-9 # number of seconds in a nanosecond
>>> datetime.utcfromtimestamp(dt64.astype(int) * ns)
datetime.datetime(2002, 6, 28, 0, 0)
To get datetime64 that uses seconds directly:
>>> dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100', 's')
>>> dt64.dtype
dtype('<M8[s]')
>>> datetime.utcfromtimestamp(dt64.astype(int))
datetime.datetime(2002, 6, 28, 0, 0)
The numpy docs say that the datetime API is experimental and may change in future numpy versions.