问题
I have dataframe:
subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type
1 1 2 3 4 mild
2 11 12 13 14 moderate
And I want to melt it to a dataframe that will look:
cond subject subject_type value_type value
A 1 mild gd 1
A 1 mild fg 2
B 1 mild gd 3
B 1 mild fg 4
A 2 moderate gd 11
A 2 moderate fg 12
B 2 moderate gd 13
B 2 moderate fg 14
...
...
Meaning, to melt based on the delimiter of the columns name.
What is the best way to do that?
回答1:
One more approach (very similar to what @anky_91 has posted. had already started typing it before he posted, hence putting it out there.)
new_df =pd.melt(df, id_vars=['subject_type','subject'], var_name='abc').sort_values(by=['subject', 'subject_type'])
new_df['cond']=new_df['abc'].apply(lambda x: (x.split('_'))[0])
new_df['value_type']=new_df.pop('abc').apply(lambda x: (x.split('_'))[-1])
new_df
Output
subject_type subject value cond value_type
0 mild 1 1 A gd
2 mild 1 2 A fd
4 mild 1 3 B gd
6 mild 1 4 B fd
1 moderate 2 11 A gd
3 moderate 2 12 A fd
5 moderate 2 13 B gd
7 moderate 2 14 B fd
回答2:
Here is my way using melt and series.str.split():
m = df.melt(['subject','subject_type'])
n = m['variable'].str.split('_',expand=True).iloc[:,[0,-1]]
n.columns = ['cond','value_type']
m = m.drop('variable',1).assign(**n).sort_values('subject')
print(m)
subject subject_type value cond value_type
0 1 mild 1 A gd
2 1 mild 2 A fd
4 1 mild 3 B gd
6 1 mild 4 B fd
1 2 moderate 11 A gd
3 2 moderate 12 A fd
5 2 moderate 13 B gd
7 2 moderate 14 B fd
回答3:
Set index to subject
, subject_type
. Split columns by the string _target_word_
to make multiindex columns. Rename axis to proper names and stack
and reset_index
df1 = df.set_index(['subject', 'subject_type'])
df1.columns = df1.columns.str.split('_target_word_', expand=True)
df_final = df1.rename_axis(['cond','value_type'],axis=1).stack([0,1]).reset_index(name='value')
Out[91]:
subject subject_type cond value_type value
0 1 mild A fd 2
1 1 mild A gd 1
2 1 mild B fd 4
3 1 mild B gd 3
4 2 moderate A fd 12
5 2 moderate A gd 11
6 2 moderate B fd 14
7 2 moderate B gd 13
回答4:
First reshape DataFrame.set_index with DataFrame.stack and DataFrame.reset_index and then convert column with _
by Series.str.split to new columns:
df = df.set_index(['subject','subject_type']).stack().reset_index(name='value')
df[['cond','value_type']] = df.pop('level_2').str.split('_', expand=True).iloc[:, [0,-1]]
print (df)
subject subject_type value cond value_type
0 1 mild 1 A gd
1 1 mild 2 A fd
2 1 mild 3 B gd
3 1 mild 4 B fd
4 2 moderate 11 A gd
5 2 moderate 12 A fd
6 2 moderate 13 B gd
7 2 moderate 14 B fd
来源:https://stackoverflow.com/questions/59550804/melt-column-by-substring-of-the-columns-name-in-pandas-python