Groupby and append lists and strings

后端 未结 2 1256
被撕碎了的回忆
被撕碎了的回忆 2021-01-12 02:43

I am trying to group-by the values in my \"value_1\" column. But my last column is made up of lists. When I try to group-by using my \"value_1\" column, the column made up o

相关标签:
2条回答
  • 2021-01-12 03:23

    You could groupby value_1 and aggregate the columns containing strings with the following function:

    def str_cat(x):
        return x.str.cat(sep=', ')
    

    And use GroupBy.sum to append the lists in the column list:

    df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': str_cat,
                                                'value_3': str_cat})
    
                            list                       value_2  \
    value_1                                                              
    american  [supermarket, connivence, state]  california, nyc, texas   
    canadian             [coffee, sipermarket]          toronto, texas   
    
                        value_3  
    value_1                                 
    american  walmart, kmart, dunkinDonuts  
    canadian         dunkinDonuts, walmart  
    
    0 讨论(0)
  • 2021-01-12 03:27

    Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

    f1 = lambda x: ', '.join(x.dropna())
    #alternative for join only strings
    #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    f2 = lambda x: [z for y in x for z in y]
    d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
    d['list'] = f2 
    
    df = df.groupby('value_1', as_index=False).agg(d)
    print (df)
         value_1                 value_2                value_3  \
    0   american  california, nyc, texas         walmart, kmart   
    1   canadian                 toronto  dunkinDonuts, walmart   
    
                                   list  
    0  [supermarket, connivence, state]  
    1             [coffee, supermarket]  
    

    Explanation:

    f1 and f2 are lambda functions.

    First remove missing values (if exist) and join strings with separator:

    f1 = lambda x: ', '.join(x.dropna())
    

    First get only strings values (omit missing values, because NaNs) and join strings with separator:

    f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    

    First get all string values with filtering empty strings and join strings with separator:

    f1 = lambda x: ', '.join([y for y in x if y != '']) 
    

    Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

    f2 = lambda x: [z for y in x for z in y]
    
    0 讨论(0)
提交回复
热议问题