Find maximum value of each day from hourly data

≯℡__Kan透↙ 提交于 2021-01-27 21:12:10

问题


I have problem getting max value of each day from hourly data. Original file contain 24 data for each name each day(there are too many name). as example here is 24 data for one name:

Start Time  Period  name    value
2/23/2019 0:00  60  MBTS_H2145X 100
2/23/2019 1:00  60  MBTS_H2145X 100
2/23/2019 2:00  60  MBTS_H2145X 1
2/23/2019 3:00  60  MBTS_H2145X 1
2/23/2019 4:00  60  MBTS_H2145X 1
2/23/2019 5:00  60  MBTS_H2145X 2324
2/23/2019 6:00  60  MBTS_H2145X 2323
2/23/2019 7:00  60  MBTS_H2145X 2323
2/23/2019 8:00  60  MBTS_H2145X 2323
2/23/2019 9:00  60  MBTS_H2145X 2323
2/23/2019 10:00 60  MBTS_H2145X 2323
2/23/2019 11:00 60  MBTS_H2145X 2323
2/23/2019 12:00 60  MBTS_H2145X 1
2/23/2019 13:00 60  MBTS_H2145X 21
2/23/2019 14:00 60  MBTS_H2145X 21
2/23/2019 15:00 60  MBTS_H2145X 23
2/23/2019 16:00 60  MBTS_H2145X 350
2/23/2019 17:00 60  MBTS_H2145X 323
2/23/2019 18:00 60  MBTS_H2145X 23
2/23/2019 19:00 60  MBTS_H2145X 23
2/23/2019 20:00 60  MBTS_H2145X 2323
2/23/2019 21:00 60  MBTS_H2145X 23
2/23/2019 22:00 60  MBTS_H2145X 23
2/23/2019 23:00 60  MBTS_H2145X 2

the result I get is: (which is wrong and should be 2324)

    Start Time  name    max value
0   2/23/2019   MBTS_H2145X 350

I have below codes but I get wrong result

import dask.dataframe as dd
import numpy as np
import pandas as pd

filename='V.csv'
df = dd.read_csv(filename, dtype='str')


#_________changing date format 
df['Start Time'] = df['Start Time'].map(lambda x: pd.to_datetime(x, errors='coerce'))
#_________change to pure date without hour
df['Start Time'] = df['Start Time'].dt.date


grouped_df=(df.groupby(['Start Time','name']).agg({'value':'max'}).rename(columns={'value':'max value'}).reset_index())

grouped_df.to_csv('e1.csv')

print(grouped_df.head(12))


回答1:


Keep your code the exact same. Just Change this line to:

grouped_df=(df.groupby(['Start Time','name']).agg({'value':'max'}).rename(columns={'value':'max value'}).reset_index())

Change to:

df.value = pd.to_numeric(df.value)

grouped_df= (df.groupby(['Start Time','name'])['value'].max().rename(columns={'value':'max value'}).reset_index()

df = pd.merge(df, grouped_df, on  = ['Start Time','name'])

There might be something happening with the aggregate function.

OR IF YOUR DTYPE IS JUST STRING, then just add the pd.to_numeric line, and keep everything else the same.



来源:https://stackoverflow.com/questions/55307132/find-maximum-value-of-each-day-from-hourly-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!