Using Pandas groupby to calculate many slopes

后端 未结 2 397
盖世英雄少女心
盖世英雄少女心 2021-01-07 00:36

Some illustrative data in a DataFrame (MultiIndex) format:

|entity| year |value| +------+------+-----+ | a | 1999 | 2 | | | 2004 | 5 | | b | 20

相关标签:
2条回答
  • 2021-01-07 00:48

    A function can be applied to a groupby with the apply function. The passed function in this case linregress. Please see below:

    In [4]: x = pd.DataFrame({'entity':['a','a','b','b','b'],
                              'year':[1999,2004,2003,2007,2014],
                              'value':[2,5,3,2,7]})
    
    In [5]: x
    Out[5]: 
      entity  value  year
    0      a      2  1999
    1      a      5  2004
    2      b      3  2003
    3      b      2  2007
    4      b      7  2014
    
    
    In [6]: from scipy.stats import linregress
    
    In [7]: x.groupby('entity').apply(lambda v: linregress(v.year, v.value)[0])
    Out[7]: 
    entity
    a    0.600000
    b    0.403226
    
    0 讨论(0)
  • 2021-01-07 01:01

    You can do this via the iterator ability of the group by object. It seems easier to do it by dropping the current index and then specifying the group by 'entity'.

    A list comprehension is then an easy way to quickly work through all the groups in the iterator. Or use a dict comprehension to get the labels in the same place (you can then stick the dict into a pd.DataFrame easily).

    import pandas as pd
    import scipy.stats
    
    #This is your data
    test = pd.DataFrame({'entity':['a','a','b','b','b'],'year':[1999,2004,2003,2007,2014],'value':[2,5,3,2,7]}).set_index(['entity','year'])
    
    #This creates the groups
    groupby = test.reset_index().groupby(['entity'])
    
    #Process groups by list comprehension
    slopes = [scipy.stats.linregress(group.year, group.value)[0] for name, group in groupby]
    #Process groups by dict comprehension
    slopes = {name:[scipy.stats.linregress(group.year, group.value)[0]] for name, group in groupby}
    
    0 讨论(0)
提交回复
热议问题