Select DataFrame rows between two dates

前端 未结 10 706
挽巷
挽巷 2020-11-22 03:14

I am creating a DataFrame from a csv as follows:

stock = pd.read_csv(\'data_in/\' + filename + \'.csv\', skipinitialspace=True)

The DataFra

相关标签:
10条回答
  • 2020-11-22 04:01

    I prefer not to alter the df.

    An option is to retrieve the index of the start and end dates:

    import numpy as np   
    import pandas as pd
    
    #Dummy DataFrame
    df = pd.DataFrame(np.random.random((30, 3)))
    df['date'] = pd.date_range('2017-1-1', periods=30, freq='D')
    
    #Get the index of the start and end dates respectively
    start = df[df['date']=='2017-01-07'].index[0]
    end = df[df['date']=='2017-01-14'].index[0]
    
    #Show the sliced df (from 2017-01-07 to 2017-01-14)
    df.loc[start:end]
    

    which results in:

         0   1   2       date
    6  0.5 0.8 0.8 2017-01-07
    7  0.0 0.7 0.3 2017-01-08
    8  0.8 0.9 0.0 2017-01-09
    9  0.0 0.2 1.0 2017-01-10
    10 0.6 0.1 0.9 2017-01-11
    11 0.5 0.3 0.9 2017-01-12
    12 0.5 0.4 0.3 2017-01-13
    13 0.4 0.9 0.9 2017-01-14
    
    0 讨论(0)
  • 2020-11-22 04:03

    You can use the isin method on the date column like so df[df["date"].isin(pd.date_range(start_date, end_date))]

    Note: This only works with dates (as the question asks) and not timestamps.

    Example:

    import numpy as np   
    import pandas as pd
    
    # Make a DataFrame with dates and random numbers
    df = pd.DataFrame(np.random.random((30, 3)))
    df['date'] = pd.date_range('2017-1-1', periods=30, freq='D')
    
    # Select the rows between two dates
    in_range_df = df[df["date"].isin(pd.date_range("2017-01-15", "2017-01-20"))]
    
    print(in_range_df)  # print result
    

    which gives

               0         1         2       date
    14  0.960974  0.144271  0.839593 2017-01-15
    15  0.814376  0.723757  0.047840 2017-01-16
    16  0.911854  0.123130  0.120995 2017-01-17
    17  0.505804  0.416935  0.928514 2017-01-18
    18  0.204869  0.708258  0.170792 2017-01-19
    19  0.014389  0.214510  0.045201 2017-01-20
    
    0 讨论(0)
  • 2020-11-22 04:07

    you can do it with pd.date_range() and Timestamp. Let's say you have read a csv file with a date column using parse_dates option:

    df = pd.read_csv('my_file.csv', parse_dates=['my_date_col'])
    

    Then you can define a date range index :

    rge = pd.date_range(end='15/6/2020', periods=2)
    

    and then filter your values by date thanks to a map:

    df.loc[df['my_date_col'].map(lambda row: row.date() in rge)]
    
    0 讨论(0)
  • 2020-11-22 04:08

    Inspired by unutbu

    print(df.dtypes)                                 #Make sure the format is 'object'. Rerunning this after index will not show values.
    columnName = 'YourColumnName'
    df[columnName+'index'] = df[columnName]          #Create a new column for index
    df.set_index(columnName+'index', inplace=True)   #To build index on the timestamp/dates
    df.loc['2020-09-03 01:00':'2020-09-06']          #Select range from the index. This is your new Dataframe.
    
    0 讨论(0)
提交回复
热议问题