问题
This is my pandas dataframe
time energy
0 2018-01-01 00:15:00 0.0000
1 2018-01-01 00:30:00 0.0000
2 2018-01-01 00:45:00 0.0000
3 2018-01-01 01:00:00 0.0000
4 2018-01-01 01:15:00 0.0000
5 2018-01-01 01:30:00 0.0000
6 2018-01-01 01:45:00 0.0000
7 2018-01-01 02:00:00 0.0000
8 2018-01-01 02:15:00 0.0000
9 2018-01-01 02:30:00 0.0000
10 2018-01-01 02:45:00 0.0000
11 2018-01-01 03:00:00 0.0000
12 2018-01-01 03:15:00 0.0000
13 2018-01-01 03:30:00 0.0000
14 2018-01-01 03:45:00 0.0000
15 2018-01-01 04:00:00 0.0000
16 2018-01-01 04:15:00 0.0000
17 2018-01-01 04:30:00 0.0000
18 2018-01-01 04:45:00 0.0000
19 2018-01-01 05:00:00 0.0000
20 2018-01-01 05:15:00 0.0000
21 2018-01-01 05:30:00 0.9392
22 2018-01-01 05:45:00 2.8788
23 2018-01-01 06:00:00 5.5768
24 2018-01-01 06:15:00 8.6660
25 2018-01-01 06:30:00 15.8648
26 2018-01-01 06:45:00 24.1760
27 2018-01-01 07:00:00 38.5324
28 2018-01-01 07:15:00 49.9292
29 2018-01-01 07:30:00 64.3788
I would like to select the values from energy column using a specific Time range 01:15:00 - 05:30:00 and sum those values. To select datas from column I need both hour and minute values. I know how to select data from column using hour and minute separately..
import panadas as pd
from datetime import datetime as dt
energy_data = pd.read_csv("/home/mayukh/Downloads/Northam_january2018/output1.csv", index_col=None)
#Using Hour
sum = energy_data[((energy_data.time.dt.hour < 1) & (energy_data.time.dt.hour >= 5))]['energy'].sum()
#using Minute
sum = energy_data[((energy_data.time.dt.minute < 15) & (energy_data.time.dt.minute >= 30))]['energy'].sum()
but I don't know how to use both hour and minute together to select data. Please tell me the way how can I will proceed.
回答1:
Use between_time working with Datetimeindex
created by set_index:
#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
a = df.set_index('time').between_time('01:15:00','05:30:00')['energy'].sum()
print (a)
0.9392
Detail:
print (df.set_index('time').between_time('01:15:00','05:30:00'))
energy
time
2018-01-01 01:15:00 0.0000
2018-01-01 01:30:00 0.0000
2018-01-01 01:45:00 0.0000
2018-01-01 02:00:00 0.0000
2018-01-01 02:15:00 0.0000
2018-01-01 02:30:00 0.0000
2018-01-01 02:45:00 0.0000
2018-01-01 03:00:00 0.0000
2018-01-01 03:15:00 0.0000
2018-01-01 03:30:00 0.0000
2018-01-01 03:45:00 0.0000
2018-01-01 04:00:00 0.0000
2018-01-01 04:15:00 0.0000
2018-01-01 04:30:00 0.0000
2018-01-01 04:45:00 0.0000
2018-01-01 05:00:00 0.0000
2018-01-01 05:15:00 0.0000
2018-01-01 05:30:00 0.9392
回答2:
You can convert your column to datetime
and use .loc
accessor with pd.Series.between:
from datetime import datetime
df['time'] = pd.to_datetime(df['time'])
start = datetime.strptime('01:15:00', '%H:%M:%S').time()
end = datetime.strptime('05:30:00', '%H:%M:%S').time()
result = df.loc[df['A'].dt.time.between(start, end), 'energy'].sum()
来源:https://stackoverflow.com/questions/49171911/how-to-select-column-for-a-specific-time-range-from-pandas-dataframe-in-python3