I have a Pandas DataFrame with a \'date\' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need
And if your dates are standardized by importing datetime package, you can simply use:
df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]
For standarding your date string using datetime package, you can use this function:
import datetime
datetime.datetime.strptime
If your datetime column have the Pandas datetime type (e.g. datetime64[ns]
), for proper filtering you need the pd.Timestamp object, for example:
from datetime import date
import pandas as pd
value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]
The shortest way to filter your dataframe by date: Lets suppose your date column is type of datetime64[ns]
# filter by single day
df = df[df['date'].dt.strftime('%Y-%m-%d') == '2014-01-01']
# filter by single month
df = df[df['date'].dt.strftime('%Y-%m') == '2014-01']
# filter by single year
df = df[df['date'].dt.strftime('%Y') == '2014']
If the dates are in the index then simply:
df['20160101':'20160301']
I'm not allowed to write any comments yet, so I'll write an answer, if somebody will read all of them and reach this one.
If the index of the dataset is a datetime and you want to filter that just by (for example) months, you can do following:
df.loc[df.index.month = 3]
That will filter the dataset for you by March.
You could just select the time range by doing: df.loc['start_date':'end_date']