Pandas DatetimeIndex indexing dtype: datetime64 vs Timestamp

我的未来我决定 提交于 2019-12-12 20:07:51


Indexing a pandas DatetimeIndex (with dtype numpy datetime64[ns]) returns either:

  • another DatetimeIndex for multiple indices
  • a pandas Timestamp for single index

The confusing part is that Timestamps do not equal np.datetime64, so that:

import numpy as np
import pandas as pd

a_datetimeindex = pd.date_range('1/1/2016', '1/2/2016', freq = 'D')
print np.in1d(a_datetimeindex[0], a_datetimeindex)

Returns false. But:

print np.in1d(a_datetimeindex[0:1], a_datetimeindex)
print np.in1d(np.datetime64(a_datetimeindex[0]), a_datetimeindex)

Returns the right results.

I guess that is because np.datetime64[ns] has accuracy to the nanosecond, but the Timestamp is truncated?

My question is, is there a way to create the DatetimeIndex so that it always indexes to the same (or comparable) data type?


You are using numpy functions to manipulate pandas types. They are not always compatible.

The function np.in1d first converts its both arguments to ndarrays. A DatetimeIndex has a built-in conversion and an array of dtype np.datetime64 is returned (it's DatetimIndex.values). But a Timestamp doesn't have such a facility and it's not converted.

Instead, you can use for example a python keyword in (the most natural way):

a_datetimeindex[0] in a_datetimeindex

or an Index.isin method for a collection of elements


If you want to use np.in1d, explicitly convert both arguments to numpy types. Or call it on the underlying numpy arrays:

np.in1d(a_datetimeindex.values[0], a_datetimeindex.values)

Alternatively, it's probably safe to use np.in1d with two collections of the same type:

np.in1d(a_datetimeindex, another_datetimeindex)

or even

np.in1d(a_datetimeindex[[0]], a_datetimeindex)

