Python: Calculate average for each hour in CSV?

问题

I want to calculate the average for each hours using a CSV file:

Below is my DATA SET:

Timestamp    Temperature
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1
9/1/2016 0:03:08    54.1
9/1/2016 0:03:38    54.1
9/1/2016 0:04:38    54
9/1/2016 0:05:38    54
9/1/2016 0:06:08    54
9/1/2016 0:06:38    54
9/1/2016 0:07:08    54
9/1/2016 0:07:38    54
9/1/2016 0:08:08    54.1
9/1/2016 0:08:38    54.1
9/1/2016 0:09:38    54.1
9/1/2016 0:10:32    54
9/1/2016 0:11:02    54
9/1/2016 0:11:32    54
9/1/2016 0:00:08    54
9/2/2016 0:00:20    32
9/2/2016 0:00:50    32
9/2/2016 0:01:20    32
9/2/2016 0:01:50    32
9/2/2016 0:02:20    32
9/2/2016 0:02:50    32
9/2/2016 0:03:20    32
9/2/2016 0:03:50    32
9/2/2016 0:04:20    32
9/2/2016 0:04:50    32
9/2/2016 0:05:20    32
9/2/2016 0:05:50    32
9/2/2016 0:06:20    32
9/2/2016 0:06:50    32
9/2/2016 0:07:20    32
9/2/2016 0:07:50    32

Here is my code for calculating per day average, but I want per hour:

from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')

df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})

print(df.groupby(level=0).mean())

My desired output is like:

Timestamp              Temp          *        Avg
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8    ?avg for this hour
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8    ?avg for this hour
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1

Now I want the average for specific hours , Min

Desired output:

Here I am printing only 5 hours output for date 01-09-2016 and 02-09-16

010900              54.362727         45.497273
010901              54.723276         45.068103
010902              54.746847         45.370270
010903              54.833913         44.931304
010904              54.971053         44.835088
010905              55.519444         44.459259
020901              31.742553         55.640426
020902              31.495556         55.655556
020903              31.304348         55.442609
020904              31.200000         55.437273
020905              31.294382         55.442697

Specific date and there specific hours? How do I archive this?

回答1:

I think you need first read_csv with parameters index_col=[0] for read first column to index and parse_dates=[0] for parse first column to DatetimeIndex:

df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
                     Temperature
Timestamp                       
2016-09-01 00:00:08         53.8
2016-09-01 00:00:38         53.8
2016-09-01 00:01:08         53.8
2016-09-01 00:01:38         53.8
2016-09-01 00:02:08         53.8
2016-09-01 00:02:38         54.1
2016-09-01 00:03:08         54.1
...
...

Then use resample by hours and aggregate Resampler.mean, but get NaN for missing data in DatetimeIndex:

print (df.resample('H').mean())
                     Temperature
Timestamp                       
2016-09-01 00:00:00    53.980952
2016-09-01 01:00:00          NaN
2016-09-01 02:00:00          NaN
2016-09-01 03:00:00          NaN
2016-09-01 04:00:00          NaN
2016-09-01 05:00:00          NaN
2016-09-01 06:00:00          NaN
2016-09-01 07:00:00          NaN
2016-09-01 08:00:00          NaN
2016-09-01 09:00:00          NaN
2016-09-01 10:00:00          NaN
2016-09-01 11:00:00          NaN
2016-09-01 12:00:00          NaN
2016-09-01 13:00:00          NaN
2016-09-01 14:00:00          NaN
2016-09-01 15:00:00          NaN
2016-09-01 16:00:00          NaN
2016-09-01 17:00:00          NaN
2016-09-01 18:00:00          NaN
2016-09-01 19:00:00          NaN
2016-09-01 20:00:00          NaN
2016-09-01 21:00:00          NaN
2016-09-01 22:00:00          NaN
2016-09-01 23:00:00          NaN
2016-09-02 00:00:00    32.000000

Another solution is remove minutes and seconds by casting to hours and groupby by this array:

print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00']

print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
            Temperature
2016-09-01    53.980952
2016-09-02    32.000000

Also if need meean by months, days and hours is posible groupby by DatetimeIndex.strftime:

print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200']

print (df.groupby([df.index.strftime('%m%d%H')]).mean())
        Temperature
090100    53.980952
090200    32.000000

Or if need mean only by hours groupby by DatetimeIndex.hour:

print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

print (df.groupby([df.index.hour]).mean())
   Temperature
0    44.475676

回答2:

I would first define a new column hour for readability, then groupBy it

df = pd.DataFrame.from_csv('/home/kk/Desktop/cal_Avg.csv',index_col=None)
df['hour']=df['Timestamp'].apply(lambda s:s[:-3])
df[['hour','Temprature']].groupBy('hour').mean()

来源：https://stackoverflow.com/questions/40256020/python-calculate-average-for-each-hour-in-csv

标签

python

csv

pandas

average

hour