Transpose dataframe based on column list

时光总嘲笑我的痴心妄想 提交于 2021-02-16 10:06:15

问题


I have a dataframe in the following structure:

cNames  | cValues   |  number  
[a,b,c] | [1,2,3]   |  10      
[a,b,d] | [55,66,77]|  20

I would like to transpose - create columns from the names in cNames.
But I can't manage to achieve this with transpose because I want a column for each value in the list.
The needed output:

a   | b   | c   | d   |  number
1   | 2   | 3   | NaN | 10
55  | 66  | NaN | 77  | 20

How can I achieve this result?
Thanks!

The code to create the DF:

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)

回答1:


One option is concat:

pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])

Or a DataFrame construction:

pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])

Output:

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20

Update Performances sort by run time on sample data

DataFrame

%%timeit
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

concat:

%%timeit
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

KJDII's new series

%%timeit
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)

2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Scott's apply(pd.Series.explode)

%%timeit
df.apply(pd.Series.explode)\
  .set_index(['number', 'cNames'], append=True)['cValues']\
  .unstack()\
  .reset_index()\
  .drop('level_0', axis=1)

4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

wwnde's set_index.apply(explode)

%%timeit
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()

7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Celius' double explode

%%timeit
df1 = df.explode('cNames').explode('cValues')
df1['cValues'] = pd.to_numeric(df1['cValues'])
df1.pivot_table(columns='cNames',index='number',values='cValues')

9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



回答2:


You can concatenate explode() and then pivot the table back to desired output!

df = df.explode('cNames').explode('cValues')
df['cValues'] = pd.to_numeric(df['cValues'])
print(df.pivot_table(columns='cNames',index='number',values='cValues'))

Output:

cNames     a     b    c     d
number                       
10       2.0   2.0  2.0   NaN
20      66.0  66.0  NaN  66.0

Pitifully, the output of explode is of type object therefore, we must transform it first to pd.to_numeric() before pivoting. Otherwise there will no be numeric values to aggregate.




回答3:



import pandas as pd

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)

df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
df = pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
print(df)

   number     a     b    c     d
0      10   1.0   2.0  3.0   NaN
1      20  55.0  66.0  NaN  77.0

if column order matters:


columns = ['a', 'b', 'c', 'd', 'number']
df = df[columns]

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20





回答4:


I'll toss my hat into this ring:

df.apply(pd.Series.explode)\
  .set_index(['number', 'cNames'], append=True)['cValues']\
  .unstack()\
  .reset_index()\
  .drop('level_0', axis=1)

Output:

cNames  number   a   b    c    d
0           10   1   2    3  NaN
1           20  55  66  NaN   77



回答5:


g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()

cNames  number     a     b    c     d
0           10   1.0   2.0  3.0   NaN
1           20  55.0  66.0  NaN  77.0


来源:https://stackoverflow.com/questions/66070517/transpose-dataframe-based-on-column-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!