How to group dataframe rows into list in pandas groupby

前端 未结 12 2129
日久生厌
日久生厌 2020-11-21 04:56

I have a pandas data frame df like:

a b
A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second col

相关标签:
12条回答
  • 2020-11-21 05:31

    Use any of the following groupby and agg recipes.

    # Setup
    df = pd.DataFrame({
      'a': ['A', 'A', 'B', 'B', 'B', 'C'],
      'b': [1, 2, 5, 5, 4, 6],
      'c': ['x', 'y', 'z', 'x', 'y', 'z']
    })
    df
    
       a  b  c
    0  A  1  x
    1  A  2  y
    2  B  5  z
    3  B  5  x
    4  B  4  y
    5  C  6  z
    

    To aggregate multiple columns as lists, use any of the following:

    df.groupby('a').agg(list)
    df.groupby('a').agg(pd.Series.tolist)
    
               b          c
    a                      
    A     [1, 2]     [x, y]
    B  [5, 5, 4]  [z, x, y]
    C        [6]        [z]
    

    To group-listify a single column only, convert the groupby to a SeriesGroupBy object, then call SeriesGroupBy.agg. Use,

    df.groupby('a').agg({'b': list})  # 4.42 ms 
    df.groupby('a')['b'].agg(list)    # 2.76 ms - faster
    
    a
    A       [1, 2]
    B    [5, 5, 4]
    C          [6]
    Name: b, dtype: object
    
    0 讨论(0)
  • 2020-11-21 05:40

    The easiest way I have see no achieve most of the same thing at least for one column which is similar to Anamika's answer just with the tuple syntax for the aggregate function.

    df.groupby('a').agg(b=('b','unique'), c=('c','unique'))
    
    0 讨论(0)
  • 2020-11-21 05:40

    Here I have grouped elements with "|" as a separator

        import pandas as pd
    
        df = pd.read_csv('input.csv')
    
        df
        Out[1]:
          Area  Keywords
        0  A  1
        1  A  2
        2  B  5
        3  B  5
        4  B  4
        5  C  6
    
        df.dropna(inplace =  True)
        df['Area']=df['Area'].apply(lambda x:x.lower().strip())
        print df.columns
        df_op = df.groupby('Area').agg({"Keywords":lambda x : "|".join(x)})
    
        df_op.to_csv('output.csv')
        Out[2]:
        df_op
        Area  Keywords
    
        A       [1| 2]
        B    [5| 5| 4]
        C          [6]
    
    0 讨论(0)
  • 2020-11-21 05:41

    A handy way to achieve this would be:

    df.groupby('a').agg({'b':lambda x: list(x)})
    

    Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

    0 讨论(0)
  • 2020-11-21 05:42

    If performance is important go down to numpy level:

    import numpy as np
    
    df = pd.DataFrame({'a': np.random.randint(0, 60, 600), 'b': [1, 2, 5, 5, 4, 6]*100})
    
    def f(df):
             keys, values = df.sort_values('a').values.T
             ukeys, index = np.unique(keys, True)
             arrays = np.split(values, index[1:])
             df2 = pd.DataFrame({'a':ukeys, 'b':[list(a) for a in arrays]})
             return df2
    

    Tests:

    In [301]: %timeit f(df)
    1000 loops, best of 3: 1.64 ms per loop
    
    In [302]: %timeit df.groupby('a')['b'].apply(list)
    100 loops, best of 3: 5.26 ms per loop
    
    0 讨论(0)
  • 2020-11-21 05:43

    To solve this for several columns of a dataframe:

    In [5]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
       ...: :[3,3,3,4,4,4]})
    
    In [6]: df
    Out[6]: 
       a  b  c
    0  A  1  3
    1  A  2  3
    2  B  5  3
    3  B  5  4
    4  B  4  4
    5  C  6  4
    
    In [7]: df.groupby('a').agg(lambda x: list(x))
    Out[7]: 
               b          c
    a                      
    A     [1, 2]     [3, 3]
    B  [5, 5, 4]  [3, 4, 4]
    C        [6]        [4]
    

    This answer was inspired from Anamika Modi's answer. Thank you!

    0 讨论(0)
提交回复
热议问题