Collapsing rows in a Pandas dataframe

前端 未结 2 813
梦谈多话
梦谈多话 2021-02-09 20:27

I\'m trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, b

相关标签:
2条回答
  • 2021-02-09 20:37

    You can use groupby with aggregation ''.join, sum or max:

    #if blank values are NaN first replace to ''
    df = df.fillna('')
    
    df = df.groupby('ID').agg(''.join)
    print (df)
         apples  pears  oranges
    ID                         
    101                 oranges
    134  apples  pears         
    576          pears  oranges
    837  apples   
    

    Also works:

    df = df.fillna('')
    df = df.groupby('ID').sum()
    #alternatively max
    #df = df.groupby('ID').max()
    print (df)
         apples  pears  oranges
    ID                         
    101                 oranges
    134  apples  pears         
    576          pears  oranges
    837  apples     
    

    Also if need remove duplicates per group and per column add unique:

    df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))
    
    0 讨论(0)
  • 2021-02-09 20:41

    Assuming blanks are ''

    option 1
    pivot_table

    df.pivot_table(['apples', 'pears', 'oranges'], 'ID', aggfunc=''.join)
    

    option 2
    sort and take last row as '' will be sorted first

    def f(df):
        return pd.DataFrame(np.sort(df.values, 0)[[-1]], [df.name], df.columns)
    
    df.set_index(
        'ID', append=True
    ).groupby(level='ID', group_keys=False).apply(f)
    

    Both yield

         apples  oranges  pears
    ID                         
    101          oranges       
    134  apples           pears
    576          oranges  pears
    837  apples                
    
    0 讨论(0)
提交回复
热议问题