Aggregate a Dask dataframe and produce a dataframe of aggregates

后端 未结 2 648
无人及你
无人及你 2021-01-19 13:07

I have a Dask dataframe that looks like this:

url     referrer    session_id ts                  customer
url1    ref1        xxx        2017-09-15 00:00:00          


        
2条回答
  •  别那么骄傲
    2021-01-19 13:17

    The following does indeed work:

    gb = df.groupby(['customer', 'url', 'ts'])
    gb.apply(lambda d: pd.DataFrame({'views': len(d), 
         'visitiors': d.session_id.count(), 
         'referrers': [d.referer.tolist()]})).reset_index()
    

    (assuming visitors should be unique as per the sql above) You may wish to define the meta of the output.

提交回复
热议问题