How do I improve the performance of pandas GroupBy filter operation?

后端 未结 1 1307
醉话见心
醉话见心 2021-01-13 00:50

This is my first time asking a question.

I\'m working with a large CSV dataset (it contains over 15 million rows and is over 1.5 GB in size).

I\'m loading th

1条回答
  •  伪装坚强ぢ
    2021-01-13 01:07

    filter is generally known to be slow when used with GroupBy. If you are trying to filter a DataFrame based on a conditional inside a GroupBy, a better alternative is to use transform or map:

    df[df.groupby('mac')['latency'].transform('count').gt(1)]
    

    df[df['mac'].map(df.groupby('mac')['latency'].count()).gt(1)]
    

    0 讨论(0)
提交回复
热议问题