Reshaping dataframes in pandas based on column labels

后端 未结 1 2012
北海茫月
北海茫月 2020-12-14 04:51

What is the best way to reshape the following dataframe in pandas? This DataFrame df has x,y values for each sample (s1 and s2

相关标签:
1条回答
  • 2020-12-14 05:13

    I'm assuming you already have the DataFrame. In which case you can just turn the columns into a MultiIndex and use stack then reset_index. Note that you'll then have to rename and reorder the columns and sort by sample to get exactly what you posted in the question:

    In [4]: df = pandas.DataFrame({"s1_x": scipy.randn(10), "s1_y": scipy.randn(10), "s2_x": scipy.randn(10), "s2_y": scipy.randn(10)})
    
    In [5]: df.columns = pandas.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
    
    In [6]: df.stack(0).reset_index(1)
    Out[6]: 
      level_1         x         y
    0      s1  0.897994 -0.278357
    0      s2 -0.008126 -1.701865
    1      s1 -1.354633 -0.890960
    1      s2 -0.773428  0.003501
    2      s1 -1.499422 -1.518993
    2      s2  0.240226  1.773427
    3      s1 -1.090921  0.847064
    3      s2 -1.061303  1.557871
    4      s1 -1.697340 -0.160952
    4      s2 -0.930642  0.182060
    5      s1 -0.356076 -0.661811
    5      s2  0.539875 -1.033523
    6      s1 -0.687861 -1.450762
    6      s2  0.700193  0.658959
    7      s1 -0.130422 -0.826465
    7      s2 -0.423473 -1.281856
    8      s1  0.306983  0.433856
    8      s2  0.097279 -0.256159
    9      s1  0.498057  0.147243
    9      s2  1.312578  0.111837
    

    You can save the MultiIndex conversion if you can just create the DataFrame with a MultiIndex instead.

    Edit: use merge to join original ids back in

    In [59]: df
    Out[59]: 
       names      s1_x      s1_y      s2_x      s2_y
    0      0  0.732099  0.018387  0.299856  0.737142
    1      1  0.914755 -0.798159 -0.732868 -1.279311
    2      2 -1.063558  0.161779 -0.115751 -0.251157
    3      3 -1.185501  0.095147 -1.343139 -0.003084
    4      4  0.622400 -0.299726  0.198710 -0.383060
    5      5  0.179318  0.066029 -0.635507  1.366786
    6      6 -0.820099  0.066067  1.113402  0.002872
    7      7  0.711627 -0.182925  1.391194 -2.788434
    8      8 -1.124092  1.303375  0.202691 -0.225993
    9      9 -0.179026  0.847466 -1.480708 -0.497067
    
    In [60]: id = df.ix[:, ['names']]
    
    In [61]: df.columns = pandas.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
    
    In [62]: pandas.merge(df.stack(0).reset_index(1), id, left_index=True, right_index=True)
    Out[62]: 
      level_1         x         y  names
    0      s1  0.732099  0.018387      0
    0      s2  0.299856  0.737142      0
    1      s1  0.914755 -0.798159      1
    1      s2 -0.732868 -1.279311      1
    2      s1 -1.063558  0.161779      2
    2      s2 -0.115751 -0.251157      2
    3      s1 -1.185501  0.095147      3
    3      s2 -1.343139 -0.003084      3
    4      s1  0.622400 -0.299726      4
    4      s2  0.198710 -0.383060      4
    5      s1  0.179318  0.066029      5
    5      s2 -0.635507  1.366786      5
    6      s1 -0.820099  0.066067      6
    6      s2  1.113402  0.002872      6
    7      s1  0.711627 -0.182925      7
    7      s2  1.391194 -2.788434      7
    8      s1 -1.124092  1.303375      8
    8      s2  0.202691 -0.225993      8
    9      s1 -0.179026  0.847466      9
    9      s2 -1.480708 -0.497067      9
    

    Alternatively:

        In [64]: df
    Out[64]: 
       names      s1_x      s1_y      s2_x      s2_y
    0      0  0.744742 -1.123403  0.212736  0.005440
    1      1  0.465075 -0.673491  1.467156 -0.176298
    2      2 -1.111566  0.168043 -0.102142 -1.072461
    3      3  1.226537 -1.147357 -1.583762 -1.236582
    4      4  1.137675  0.224422  0.738988  1.528416
    5      5 -0.237014 -1.110303 -0.770221  1.389714
    6      6 -0.659213  2.305374 -0.326253  1.416778
    7      7  1.524214 -0.395451 -1.884197  0.524606
    8      8  0.375112 -0.622555  0.295336  0.927208
    9      9  1.168386 -0.291899 -1.462098  0.250889
    
    In [65]: df = df.set_index('names')
    
    In [66]: df.columns = pandas.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
    
    In [67]: df.stack(0).reset_index(1)
    Out[67]: 
          level_1         x         y
    names                            
    0          s1  0.744742 -1.123403
    0          s2  0.212736  0.005440
    1          s1  0.465075 -0.673491
    1          s2  1.467156 -0.176298
    2          s1 -1.111566  0.168043
    2          s2 -0.102142 -1.072461
    3          s1  1.226537 -1.147357
    3          s2 -1.583762 -1.236582
    4          s1  1.137675  0.224422
    4          s2  0.738988  1.528416
    5          s1 -0.237014 -1.110303
    5          s2 -0.770221  1.389714
    6          s1 -0.659213  2.305374
    6          s2 -0.326253  1.416778
    7          s1  1.524214 -0.395451
    7          s2 -1.884197  0.524606
    8          s1  0.375112 -0.622555
    8          s2  0.295336  0.927208
    9          s1  1.168386 -0.291899
    9          s2 -1.462098  0.250889
    
    0 讨论(0)
提交回复
热议问题