Python - Pandas, group by time intervals

安稳与你 提交于 2021-01-28 03:22:47

问题


Having the following DF:

group_id                timestamp
       A  2020-09-29 06:00:00 UTC
       A  2020-09-29 08:00:00 UTC
       A  2020-09-30 09:00:00 UTC
       B  2020-09-01 04:00:00 UTC
       B  2020-09-01 06:00:00 UTC

I would like to count the deltas between records using all groups, not counting deltas between groups. Result for the above example:

delta       count
    2           2
   25           1

Explanation: In group A the deltas are

06:00:00 -> 08:00:00 (2 hours)
08:00:00 -> 09:00:00 on the next day (25 hours)

And in group B:

04:00:00 -> 06:00:00 (2 hours)

How can I achieve this using Python Pandas?


回答1:


Use DataFrameGroupBy.diff for differencies per groups, convert to seconds by Series.dt.total_seconds, divide by 3600 for hours and last count values by Series.value_counts with convert Series to 2 columns DataFrame:

df1 = (df.groupby("group_id")['timestamp']
        .diff()
        .dt.total_seconds()
        .div(3600)
        .value_counts()
        .rename_axis('delta')
        .reset_index(name='count'))
print (df1)
   delta  count
0    2.0      2
1   25.0      1



回答2:


Code

df_out = df.groupby("group_id").diff().groupby("timestamp").size()

# convert to dataframe
df_out = df_out.to_frame().reset_index().rename(columns={"timestamp": "delta", 0: "count"})

Result

print(df_out)
            delta  count
0 0 days 02:00:00      2
1 1 days 01:00:00      1

The NaT's (missing values) produced by groupby-diff were ignored automatically.

To represent timedelta in hours, just call total_seconds() method.

df_out["delta"] = df_out["delta"].dt.total_seconds() / 3600

print(df_out)
   delta  count
0    2.0      2
1   25.0      1


来源:https://stackoverflow.com/questions/64966109/python-pandas-group-by-time-intervals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!