Filtering Pandas DataFrames on dates

前端 未结 12 1093
时光取名叫无心
时光取名叫无心 2020-11-22 16:15

I have a Pandas DataFrame with a \'date\' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need

相关标签:
12条回答
  • 2020-11-22 16:34

    You can use pd.Timestamp to perform a query and a local reference

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame()
    ts = pd.Timestamp
    
    df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')
    
    print(df)
    print(df.query('date > @ts("20190515T071320")')
    

    with the output

                     date
    0 2019-05-15 07:13:16
    1 2019-05-15 07:13:17
    2 2019-05-15 07:13:18
    3 2019-05-15 07:13:19
    4 2019-05-15 07:13:20
    5 2019-05-15 07:13:21
    6 2019-05-15 07:13:22
    7 2019-05-15 07:13:23
    8 2019-05-15 07:13:24
    9 2019-05-15 07:13:25
    
    
                     date
    5 2019-05-15 07:13:21
    6 2019-05-15 07:13:22
    7 2019-05-15 07:13:23
    8 2019-05-15 07:13:24
    9 2019-05-15 07:13:25
    

    Have a look at the pandas documentation for DataFrame.query, specifically the mention about the local variabile referenced udsing @ prefix. In this case we reference pd.Timestamp using the local alias ts to be able to supply a timestamp string

    0 讨论(0)
  • 2020-11-22 16:36

    Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:

    import datetime 
    df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]
    
    0 讨论(0)
  • 2020-11-22 16:39

    So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().

    If you just want to show the data for two months from Jan to Feb, e.g. 2020-01-01 to 2020-02-29, you can do so:

    import pandas as pd
    mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
    mydata['2020-01-01':'2020-02-29'] # will pull all the columns
    #if just need one column, e.g. Cost, can be done:
    mydata['2020-01-01':'2020-02-29','Cost'] 
    

    This has been tested working for Python 3.7. Hope you will find this useful.

    0 讨论(0)
  • 2020-11-22 16:39

    If you have already converted the string to a date format using pd.to_datetime you can just use:

    df = df[(df['Date']> "2018-01-01") & (df['Date']< "2019-07-01")]

    0 讨论(0)
  • 2020-11-22 16:42

    How about using pyjanitor

    It has cool features.

    After pip install pyjanitor

    import janitor
    
    df_filtered = df.filter_date(your_date_column_name, start_date, end_date)
    
    0 讨论(0)
  • 2020-11-22 16:43

    If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.

    For example:

    df.loc['2014-01-01':'2014-02-01']
    

    See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

    If the column is not the index you have two choices:

    1. Make it the index (either temporarily or permanently if it's time-series data)
    2. df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

    See here for the general explanation

    Note: .ix is deprecated.

    0 讨论(0)
提交回复
热议问题