get first and last values in a groupby

前端 未结 3 613
野性不改
野性不改 2020-12-02 23:34

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [[\'a\', \'a\', \'a\', \'a\', \'b\', \'b\', \'b\', \'c\         


        
相关标签:
3条回答
  • 2020-12-02 23:54

    This could be on of the easy solution.

    df.groupby(level = 0, as_index= False).nth([0,-1])
    
          X   Y
    a a   0   1
      d   6   7
    b e   8   9
      g  12  13
    c h  14  15
      i  16  17
    d j  18  19
    

    Hope this helps. (Y)

    0 讨论(0)
  • 2020-12-02 23:58

    Please try this:

    For last value: df.groupby('Column_name').nth(-1),

    For first value: df.groupby('Column_name').nth(0)

    0 讨论(0)
  • 2020-12-03 00:04

    Option 1

    def first_last(df):
        return df.ix[[0, -1]]
    
    df.groupby(level=0, group_keys=False).apply(first_last)
    


    Option 2 - only works if index is unique

    idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
    df.loc[idx]
    

    Option 3 - per notes below, this only makes sense when there are no NAs

    I also abused the agg function. The code below works, but is far uglier.

    df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
        .set_index('level_1', append=True).reset_index(1, drop=True) \
        .rename_axis([None, None])
    

    Note

    per @unutbu: agg(['first', 'last']) take the firs non-na values.

    I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

    Let's include another test

    df = pd.DataFrame(np.arange(20).reshape(10, -1),
                      [list('aaaabbbccd'),
                       list('abcdefghij')],
                      list('XY'))
    
    df.loc[tuple('aa'), 'X'] = np.nan
    

    def first_last(df):
        return df.ix[[0, -1]]
    
    df.groupby(level=0, group_keys=False).apply(first_last)
    

    df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
        .set_index('level_1', append=True).reset_index(1, drop=True) \
        .rename_axis([None, None])
    

    Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

    0 讨论(0)
提交回复
热议问题