问题
I have two dataframes:
>>> df1
[Output]: col1 col2 col3 col4
a abc 10 str1
b abc 20 str2
c def 20 str2
d abc 30 str2
>>> df2
[Output]: col1 col2 col3 col5 col6
d abc 30 str6 47
b abc 20 str5 66
c def 20 str7 53
a abc 10 str5 21
Below is what I want to generate:
>>> df_merged
[Output]: col1 col2 col5
a abc str5
b abc str5
c def str7
d abc str6
I don't want to generate more than 4 rows and that is usually what happens when I try to merge the dataframes. Thanks for the tips!
回答1:
Use .merge
by subselecting the correct columns and using col1
& col2
as key columns:
df1[['col1', 'col2']].merge(df2[['col1', 'col2', 'col5']], on=['col1', 'col2'])
col1 col2 col5
0 a abc str5
1 b abc str5
2 c def str7
3 d abc str6
回答2:
df_merged = pd.DataFrame()
df_merged['col1'] = df1['col1'][0:3]
df_merged['col2'] = df1['col2'][0:3]
df_merged['col5'] = df2['col5'][0:3]
Does that help with what you're looking for?
来源:https://stackoverflow.com/questions/57173240/merging-two-pandas-dataframes-on-multiple-columns