I\'m new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to
I have unfortunately stumbled upon a similar problem which I see is now old. I solved it by using this function in a different way, applying it to the two original tables, even though there were no duplicates in these. This is an example (I apologize, I am not a professional programmer):
import pandas as pd
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1)
df1=df1.drop_duplicates()
df2 = pd.DataFrame(dict2)
df2=df2.drop_duplicates()
df=pd.merge(df1,df2)
print('df1:')
print( df1 )
print('df2:')
print( df2 )
print('df:')
print( df )
import pandas as pd
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1).reset_index()
df2 = pd.DataFrame(dict2).reset_index()
df = df1.merge(df2, on = 'A')
df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)
print(df)
Output:
A
0 2
1 2
2 3
3 4
4 5
did you try df.drop_duplicates() ?
import pandas as pd
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
df=pd.merge(df1,df2)
df_new=df.drop_duplicates()
print df
print df_new
Seems that it gives the results that you want
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]
df1.merge(df2).drop('index', 1, inplace = True)
The idea is to merge based on the matching indices as well as matching 'A' column values.
Previously, since the way merge works depends on matches, what happened is that the first 2 in df1 was matched to both the first and second 2 in df2, and the second 2 in df1 was matched to both the first and second 2 in df2 as well.
If you try this, you will see what I am talking about.
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]
df1.merge(df2, on = 'A')