Pandas: Difference between largest and smallest value within group

后端 未结 3 1705
心在旅途
心在旅途 2020-11-30 02:45

Given a data frame that looks like this

GROUP VALUE
  1     5
  2     2
  1     10
  2     20
  1     7

I would like to compute the differe

相关标签:
3条回答
  • 2020-11-30 02:48

    Using @unutbu 's df

    per timing
    unutbu's solution is best over large data sets

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
    
    df.groupby('GROUP')['VALUE'].agg(np.ptp)
    
    GROUP
    1     5
    2    18
    Name: VALUE, dtype: int64
    

    np.ptp docs returns the range of an array


    timing
    small df

    large df
    df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

    large df
    many groups
    df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

    0 讨论(0)
  • 2020-11-30 02:55

    groupby/agg generally performs best when you take advantage of the built-in aggregators such as 'max' and 'min'. So to obtain the difference, first compute the max and min and then subtract:

    import pandas as pd
    df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
    result = df.groupby('GROUP')['VALUE'].agg(['max','min'])
    result['diff'] = result['max']-result['min']
    print(result[['diff']])
    

    yields

           diff
    GROUP      
    1         5
    2        18
    
    0 讨论(0)
  • 2020-11-30 02:57

    You can use groupby(), min(), and max():

    df.groupby('GROUP')['VALUE'].apply(lambda g: g.max() - g.min())
    
    0 讨论(0)
提交回复
热议问题