I have an array of datetime64 type:
dates = np.datetime64([\'2010-10-17\', \'2011-05-13\', \"2012-01-15\"])
Is there a better way than loop
This is how I do it.
import numpy as np
def dt2cal(dt):
"""
Convert array of datetime64 to a calendar array of year, month, day, hour,
minute, seconds, microsecond with these quantites indexed on the last axis.
Parameters
----------
dt : datetime64 array (...)
numpy.ndarray of datetimes of arbitrary shape
Returns
-------
cal : uint32 array (..., 7)
calendar array with last axis representing year, month, day, hour,
minute, second, microsecond
"""
# allocate output
out = np.empty(dt.shape + (7,), dtype="u4")
# decompose calendar floors
Y, M, D, h, m, s = [dt.astype(f"M8[{x}]") for x in "YMDhms"]
out[..., 0] = Y + 1970 # Gregorian Year
out[..., 1] = (M - Y) + 1 # month
out[..., 2] = (D - M) + 1 # dat
out[..., 3] = (dt - D).astype("m8[h]") # hour
out[..., 4] = (dt - h).astype("m8[m]") # minute
out[..., 5] = (dt - m).astype("m8[s]") # second
out[..., 6] = (dt - s).astype("m8[us]") # microsecond
return out
It's vectorized across arbitrary input dimensions, it's fast, its intuitive, it works on numpy v1.15.4, it doesn't use pandas.
I really wish numpy supported this functionality, it's required all the time in application development. I always get super nervous when I have to roll my own stuff like this, I always feel like I'm missing an edge case.
Use dates.tolist()
to convert to native datetime objects, then simply access year
. Example:
>>> dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype='datetime64')
>>> [x.year for x in dates.tolist()]
[2010, 2011, 2012]
This is basically the same idea exposed in https://stackoverflow.com/a/35281829/2192272, but using simpler syntax.
Tested with python 3.6 / numpy 1.18.
Another possibility is:
np.datetime64(dates,'Y') - returns - numpy.datetime64('2010')
or
np.datetime64(dates,'Y').astype(int)+1970 - returns - 2010
but works only on scalar values, won't take array
Using numpy version 1.10.4 and pandas version 0.17.1,
dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype=np.datetime64)
pd.to_datetime(dates).year
I get what you're looking for:
array([2010, 2011, 2012], dtype=int32)