Python pandas melting data to multiple columns and coulmn names in another column

谁说胖子不能爱 提交于 2019-12-13 16:40:42

问题


I have a dataframe which I want to melt the data into multiple target columns. The below code I used

grp2 = pd.lreshape(grp1, cols.groupby(cols.str.split('_').str[1])).sort_values('ACCT_NAME')

The above line I lose the column names

grp2 = pd.melt(grp1 , id_vars = ['Client' , 'Industry'] , var_name = "H Year" , value_name = 'Count')

The above line I dont get multiple target columns

From DF

Client  INDUSTRY    1H2016_6MO  2H2016_6MO  1H2017_6MO  2H2017_6MO  1H2016_12MO 2H2016_12MO 1H2017_12MO 2H2017_12MO

XXX      AAA         1          0           0           0           1           1           0            0

YYY      BBB         0          0           1           0           0           0           0            1
ZZZ      CCC         1          1           0           0           0           0           1            1

XXX      AAA         1          0           0           0           1           1           0            0

TO DF

Client  INDUSTRY    Year_Half   6MO 12MO
XXX     AAA         1H2016      2   2
XXX     AAA         2H2016      0   2
XXX     AAA         1H2017      0   0
XXX     AAA         2H2017      0   0
YYY     BBB         1H2016      0   0
YYY     BBB         2H2016      0   0
YYY     BBB         1H2017      1   0
YYY     BBB         2H2017      0   1
ZZZ CCC 1H2016  1   0
ZZZ CCC 2H2016  1   0
ZZZ CCC 1H2017  0   1
ZZZ CCC 2H2017  0   1

Please advise on the solution to this. I have seen other question but they dont take the column name into seperate columns


回答1:


Use:

  • set_index for separate columns
  • create MultiIndex by split
  • reshape by stack

df = df.set_index(['Client','INDUSTRY'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).reset_index().rename(columns={'level_2':'Year_Half'})
print (df)
   Client INDUSTRY Year_Half  12MO  6MO
0     XXX      AAA    1H2016     1    1
1     XXX      AAA    1H2017     0    0
2     XXX      AAA    2H2016     1    0
3     XXX      AAA    2H2017     0    0
4     YYY      BBB    1H2016     0    0
5     YYY      BBB    1H2017     0    1
6     YYY      BBB    2H2016     0    0
7     YYY      BBB    2H2017     1    0
8     ZZZ      CCC    1H2016     0    1
9     ZZZ      CCC    1H2017     1    0
10    ZZZ      CCC    2H2016     0    1
11    ZZZ      CCC    2H2017     1    0
12    XXX      AAA    1H2016     1    1
13    XXX      AAA    1H2017     0    0
14    XXX      AAA    2H2016     1    0
15    XXX      AAA    2H2017     0    0

If only 6MO and 12MO values and ordering of columns is important:

df = df.set_index(['Client','INDUSTRY'])
df.columns = df.columns.str.split('_', expand=True)
df = (df.stack(0)
       .reindex_axis(['6MO','12MO'], 1)
       .reset_index()
       .rename(columns={'level_2':'Year_Half'}))
print (df)
   Client INDUSTRY Year_Half  6MO  12MO
0     XXX      AAA    1H2016    1     1
1     XXX      AAA    1H2017    0     0
2     XXX      AAA    2H2016    0     1
3     XXX      AAA    2H2017    0     0
4     YYY      BBB    1H2016    0     0
5     YYY      BBB    1H2017    1     0
6     YYY      BBB    2H2016    0     0
7     YYY      BBB    2H2017    0     1
8     ZZZ      CCC    1H2016    1     0
9     ZZZ      CCC    1H2017    0     1
10    ZZZ      CCC    2H2016    1     0
11    ZZZ      CCC    2H2017    0     1
12    XXX      AAA    1H2016    1     1
13    XXX      AAA    1H2017    0     0
14    XXX      AAA    2H2016    0     1
15    XXX      AAA    2H2017    0     0


来源:https://stackoverflow.com/questions/46234549/python-pandas-melting-data-to-multiple-columns-and-coulmn-names-in-another-colum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!