问题
I have a CSV file of the following format:
vm,time,LoadInt1
abc-webapp-02,2017-05-31 10:00:00,3.133333
abc-webapp-02,2017-05-31 10:05:00,0.000000
abc-webapp-02,2017-05-31 10:10:00,0.000000
abc-webapp-02,2017-05-31 10:15:00,0.000000
abc-webapp-02,2017-05-31 10:20:00,0.000000
abc-webapp-02,2017-05-31 10:25:00,0.000000
abc-webapp-02,2017-05-31 10:30:00,0.000000
abc-webapp-02,2017-05-31 10:35:00,0.000000
abc-webapp-02,2017-05-31 10:40:00,0.000000
I read the CSV file into a DataFrame using the following code. The date is parsed as index (DatetimeIndex)
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv("my_file.csv", header=0, parse_dates=[1], index_col=1, date_parser=dateparse)
Now I am trying to get all the rows between two dates using the following code (The real CSV file has large number of rows between the dates mentioned below):
df.loc['2017-05-30' : '2017-05-31']
Please note, above approach is suggested here. But, it's not working for me. So, it may not be a duplicate question.
回答1:
Using query
method:
df = pd.read_csv("my_file.csv", index_col=1, parse_dates=True)
In [121]: df.query("'2017-05-30' <= index <= '2017-06-01'")
Out[121]:
vm LoadInt1
time
2017-05-31 10:00:00 abc-webapp-02 3.133333
2017-05-31 10:05:00 abc-webapp-02 0.000000
2017-05-31 10:10:00 abc-webapp-02 0.000000
2017-05-31 10:15:00 abc-webapp-02 0.000000
2017-05-31 10:20:00 abc-webapp-02 0.000000
2017-05-31 10:25:00 abc-webapp-02 0.000000
2017-05-31 10:30:00 abc-webapp-02 0.000000
2017-05-31 10:35:00 abc-webapp-02 0.000000
2017-05-31 10:40:00 abc-webapp-02 0.000000
回答2:
This type of index slicing includes the end points and so what you have will include the entire sample set
df.loc['2017-05-30':'2017-05-31'] #df['2017-05-30':'2017-05-31'] vm LoadInt1 time 2017-05-31 10:00:00 abc-webapp-02 3.133333 2017-05-31 10:05:00 abc-webapp-02 0.000000 2017-05-31 10:10:00 abc-webapp-02 0.000000 2017-05-31 10:15:00 abc-webapp-02 0.000000 2017-05-31 10:20:00 abc-webapp-02 0.000000 2017-05-31 10:25:00 abc-webapp-02 0.000000 2017-05-31 10:30:00 abc-webapp-02 0.000000 2017-05-31 10:35:00 abc-webapp-02 0.000000 2017-05-31 10:40:00 abc-webapp-02 0.000000
This shows the same thing but actually subsets
df.loc['2017-05-31 10:10':'2017-05-31 10:35'] vm LoadInt1 time 2017-05-31 10:10:00 abc-webapp-02 0.0 2017-05-31 10:15:00 abc-webapp-02 0.0 2017-05-31 10:20:00 abc-webapp-02 0.0 2017-05-31 10:25:00 abc-webapp-02 0.0 2017-05-31 10:30:00 abc-webapp-02 0.0 2017-05-31 10:35:00 abc-webapp-02 0.0
Your import could be made smaller. You don't need the parser
df = pd.read_csv("my_file.csv", parse_dates=[1], index_col=1)
来源:https://stackoverflow.com/questions/45084935/select-rows-between-two-datetimeindex-dates