Flatten multiple columns in a dataframe to a single column

后端 未结 4 2058
独厮守ぢ
独厮守ぢ 2020-12-11 11:07

I have a dataframe like this:

id    other_id_1    other_id_2    other_id_3
1     100           101           102
2     200           201           202
3              


        
相关标签:
4条回答
  • 2020-12-11 11:21

    By using pd.wide_to_long:

    pd.wide_to_long(df,'other_id_',i='id',j='drop').reset_index().drop('drop',axis=1).sort_values('id')
        Out[36]: 
           id  other_id_
        0   1        100
        3   1        101
        6   1        102
        1   2        200
        4   2        201
        7   2        202
        2   3        300
        5   3        301
        8   3        302
    

    or unstack

    df.set_index('id').unstack().reset_index().drop('level_0',1).rename(columns={0:'other_id'})
    
    Out[43]: 
       id  other_id
    0   1       100
    1   2       200
    2   3       300
    3   1       101
    4   2       201
    5   3       301
    6   1       102
    7   2       202
    8   3       302
    
    0 讨论(0)
  • 2020-12-11 11:32

    If id isn't the index, set it first:

    df = df.set_index('id')
    
    df
    
        other_id_1  other_id_2  other_id_3
    id                                    
    1          100         101         102
    2          200         201         202
    3          300         301         302
    

    Now, call the pd.DataFrame constructor. You'll have to tile the index using np.repeat.

    df_new = pd.DataFrame({'other_id' : df.values.reshape(-1,)}, 
                             index=np.repeat(df.index, len(df.columns)))
    df_new
    
        other_id
    id          
    1        100
    1        101
    1        102
    2        200
    2        201
    2        202
    3        300
    3        301
    3        302
    
    0 讨论(0)
  • 2020-12-11 11:37

    Well, if you haven't already, set id as the index:

    >>> df
       id  other_id_1  other_id_2  other_id_3
    0   1         100         101         102
    1   2         200         201         202
    2   3         300         301         302
    >>> df.set_index('id', inplace=True)
    >>> df
        other_id_1  other_id_2  other_id_3
    id
    1          100         101         102
    2          200         201         202
    3          300         301         302
    

    Then, you can simply use pd.concat:

    >>> df = pd.concat([df[col] for col in df])
    >>> df
    id
    1    100
    2    200
    3    300
    1    101
    2    201
    3    301
    1    102
    2    202
    3    302
    dtype: int64
    

    And if you need the values sorted:

    >>> df.sort_values()
    id
    1    100
    1    101
    1    102
    2    200
    2    201
    2    202
    3    300
    3    301
    3    302
    dtype: int64
    >>>
    
    0 讨论(0)
  • 2020-12-11 11:40

    One more (or rather two):)

    pd.melt(df, id_vars='id', value_vars=['other_id_1', 'other_id_2', 'other_id_3'], value_name='other_id')\
    .drop('variable', 1).sort_values(by = 'id')
    

    Option 2:

    df.set_index('id').stack().reset_index(1,drop = True).reset_index()\ 
    .rename(columns = {0:'other_id'})
    

    Both ways you get

        id  other_id
    0   1   100
    1   1   101
    2   1   102
    3   2   200
    4   2   201
    5   2   202
    6   3   300
    7   3   301
    8   3   302
    
    0 讨论(0)
提交回复
热议问题