Aggregate a Dask dataframe and produce a dataframe of aggregates

后端 未结 2 650
无人及你
无人及你 2021-01-19 13:07

I have a Dask dataframe that looks like this:

url     referrer    session_id ts                  customer
url1    ref1        xxx        2017-09-15 00:00:00          


        
2条回答
  •  孤街浪徒
    2021-01-19 13:17

    This is the link to the github issue that @j-bennet opened that gives an additional option. Based on the issue we implemented the aggregation as follows:
    custom_agg = dd.Aggregation( 'custom_agg', lambda s: s.apply(set), lambda s: s.apply(lambda chunks: list(set(itertools.chain.from_iterable(chunks)))), ).
    In order to combine with the count the code is as follows
    dfgp = df.groupby(['ID1','ID2']) df2 = dfgp.assign(cnt=dfgp.size()).agg(custom_agg).reset_index()

提交回复
热议问题