Merge multiple column values into one column in python pandas

后端 未结 3 1689
我寻月下人不归
我寻月下人不归 2020-11-28 03:07

I have a pandas data frame like this:

   Column1  Column2  Column3  Column4  Column5
 0    a        1        2        3        4
 1    a        3        4            


        
相关标签:
3条回答
  • 2020-11-28 03:21

    You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:

    In [153]:
    df['ColumnA'] = df[df.columns[1:]].apply(
        lambda x: ','.join(x.dropna().astype(str)),
        axis=1
    )
    df
    
    Out[153]:
      Column1  Column2  Column3  Column4  Column5  ColumnA
    0       a        1        2        3        4  1,2,3,4
    1       a        3        4        5      NaN    3,4,5
    2       b        6        7        8      NaN    6,7,8
    3       c        7        7      NaN      NaN      7,7
    

    Here I call dropna to get rid of the NaN, however we need to cast again to int so we don't end up with floats as str.

    0 讨论(0)
  • 2020-11-28 03:24

    I propose to use .assign

    df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
      df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
      df.Column4.astype(str) + ', ' df.Column5.astype(str))
    

    it's simple, maybe long but it worked for me

    0 讨论(0)
  • 2020-11-28 03:25

    If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column name e.g. -Column2 in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2 inclusive of Column2 as OP asked).

    We can get position of column using .get_loc() - as answered here

    source_col_loc = df.columns.get_loc('Column2') # column position starts from 0
    
    df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
        lambda x: ",".join(x.astype(str)), axis=1)
    
    df
    
    Column1  Column2  Column3  Column4  Column5  ColumnA
    0       a        1        2        3        4  1,2,3,4
    1       a        3        4        5      NaN    3,4,5
    2       b        6        7        8      NaN    6,7,8
    3       c        7        7      NaN      NaN      7,7
    

    To remove NaN, use .dropna() or .fillna()

    Hope it helps!

    0 讨论(0)
提交回复
热议问题