How to select column for a specific time range from pandas dataframe in python3?

问题

This is my pandas dataframe

                     time    energy
0     2018-01-01 00:15:00    0.0000
1     2018-01-01 00:30:00    0.0000
2     2018-01-01 00:45:00    0.0000
3     2018-01-01 01:00:00    0.0000
4     2018-01-01 01:15:00    0.0000
5     2018-01-01 01:30:00    0.0000
6     2018-01-01 01:45:00    0.0000
7     2018-01-01 02:00:00    0.0000
8     2018-01-01 02:15:00    0.0000
9     2018-01-01 02:30:00    0.0000
10    2018-01-01 02:45:00    0.0000
11    2018-01-01 03:00:00    0.0000
12    2018-01-01 03:15:00    0.0000
13    2018-01-01 03:30:00    0.0000
14    2018-01-01 03:45:00    0.0000
15    2018-01-01 04:00:00    0.0000
16    2018-01-01 04:15:00    0.0000
17    2018-01-01 04:30:00    0.0000
18    2018-01-01 04:45:00    0.0000
19    2018-01-01 05:00:00    0.0000
20    2018-01-01 05:15:00    0.0000
21    2018-01-01 05:30:00    0.9392
22    2018-01-01 05:45:00    2.8788
23    2018-01-01 06:00:00    5.5768
24    2018-01-01 06:15:00    8.6660
25    2018-01-01 06:30:00   15.8648
26    2018-01-01 06:45:00   24.1760
27    2018-01-01 07:00:00   38.5324
28    2018-01-01 07:15:00   49.9292
29    2018-01-01 07:30:00   64.3788

I would like to select the values from energy column using a specific Time range 01:15:00 - 05:30:00 and sum those values. To select datas from column I need both hour and minute values. I know how to select data from column using hour and minute separately..

import panadas as pd
from datetime import datetime as dt
energy_data = pd.read_csv("/home/mayukh/Downloads/Northam_january2018/output1.csv", index_col=None)
#Using Hour 
sum = energy_data[((energy_data.time.dt.hour < 1) & (energy_data.time.dt.hour >= 5))]['energy'].sum()
#using Minute
sum = energy_data[((energy_data.time.dt.minute < 15) & (energy_data.time.dt.minute >= 30))]['energy'].sum()

but I don't know how to use both hour and minute together to select data. Please tell me the way how can I will proceed.

回答1:

Use between_time working with Datetimeindex created by set_index:

#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
a = df.set_index('time').between_time('01:15:00','05:30:00')['energy'].sum()
print (a)
0.9392

Detail:

print (df.set_index('time').between_time('01:15:00','05:30:00'))
                     energy
time                       
2018-01-01 01:15:00  0.0000
2018-01-01 01:30:00  0.0000
2018-01-01 01:45:00  0.0000
2018-01-01 02:00:00  0.0000
2018-01-01 02:15:00  0.0000
2018-01-01 02:30:00  0.0000
2018-01-01 02:45:00  0.0000
2018-01-01 03:00:00  0.0000
2018-01-01 03:15:00  0.0000
2018-01-01 03:30:00  0.0000
2018-01-01 03:45:00  0.0000
2018-01-01 04:00:00  0.0000
2018-01-01 04:15:00  0.0000
2018-01-01 04:30:00  0.0000
2018-01-01 04:45:00  0.0000
2018-01-01 05:00:00  0.0000
2018-01-01 05:15:00  0.0000
2018-01-01 05:30:00  0.9392

回答2:

You can convert your column to datetime and use .loc accessor with pd.Series.between:

from datetime import datetime

df['time'] = pd.to_datetime(df['time'])

start = datetime.strptime('01:15:00', '%H:%M:%S').time()
end = datetime.strptime('05:30:00', '%H:%M:%S').time()

result = df.loc[df['A'].dt.time.between(start, end), 'energy'].sum()

来源：https://stackoverflow.com/questions/49171911/how-to-select-column-for-a-specific-time-range-from-pandas-dataframe-in-python3

标签

python

python-3.x

pandas

python-datetime