Filtering Pandas DataFrames on dates

前端未结

关注

 12  1093

I have a Pandas DataFrame with a \'date\' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need

相关标签:

12条回答

名媛妹妹

2020-11-22 16:34

You can use pd.Timestamp to perform a query and a local reference

import pandas as pd
import numpy as np

df = pd.DataFrame()
ts = pd.Timestamp

df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')

print(df)
print(df.query('date > @ts("20190515T071320")')

with the output

                 date
0 2019-05-15 07:13:16
1 2019-05-15 07:13:17
2 2019-05-15 07:13:18
3 2019-05-15 07:13:19
4 2019-05-15 07:13:20
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25


                 date
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25

Have a look at the pandas documentation for DataFrame.query, specifically the mention about the local variabile referenced udsing @ prefix. In this case we reference pd.Timestamp using the local alias ts to be able to supply a timestamp string

0 讨论(0)

既然无缘

2020-11-22 16:36
Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:
```
import datetime 
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2020-11-22 16:39
So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().

If you just want to show the data for two months from Jan to Feb, e.g. 2020-01-01 to 2020-02-29, you can do so:
```
import pandas as pd
mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
mydata['2020-01-01':'2020-02-29'] # will pull all the columns
#if just need one column, e.g. Cost, can be done:
mydata['2020-01-01':'2020-02-29','Cost'] 
```
This has been tested working for Python 3.7. Hope you will find this useful.
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2020-11-22 16:39

If you have already converted the string to a date format using pd.to_datetime you can just use:

df = df[(df['Date']> "2018-01-01") & (df['Date']< "2019-07-01")]

0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2020-11-22 16:42
How about using pyjanitor

It has cool features.

After pip install pyjanitor
```
import janitor

df_filtered = df.filter_date(your_date_column_name, start_date, end_date)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2020-11-22 16:43
If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.

For example:
```
df.loc['2014-01-01':'2014-02-01']
```
See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

If the column is not the index you have two choices:
1. Make it the index (either temporarily or permanently if it's time-series data)
2. df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]
See here for the general explanation

Note: .ix is deprecated.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页