I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that ou
bit uglier Numpy alternative:
In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64
doesn't seem to match datetime64[ns, UTC]
.
df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype) # datetime64[ns, UTC]
You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype
function from the pandas api instead.
In:
from pandas.api.types import is_datetime64_any_dtype as is_datetime
df[[column for column in df.columns if is_datetime(df[column])]]
Out:
date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00
Pandas has a cool function called select_dtypes
, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64
. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
, to filter by numerical columns only: [np.number]
.
df.select_dtypes(include=[np.datetime64])
Out:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In:
df.select_dtypes(include=[np.number])
Out:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
Recently I needed to check if any element of a column was a date or numeric
My approach was, try to convert to type (datetime or numeric), then check if any element is null
pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce')
output:
0 2010-09-16
1 2010-09-16
2 2018-06-04
3 NaT
4 NaT
5 2018-11-30
Then use isnull()
to check if the elements could be converted
pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce').isnull().any()
This will return True because at last one element is null/NaT
To check for numerics
data_temp.eval('col_name').astype(str).str.isnumeric().all()
This will return True if all elements on the column are numeric
Both will return a numpy.bool_, but it can easily be converted to bool if needed
type(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any())
output:
numpy.bool_
--
type(bool(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any()))
output:
bool
This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.
for col in data.columns:
if data[col].dtype == 'object':
try:
data[col] = pd.to_datetime(data[col])
except ValueError:
pass