Pandas merge creates unwanted duplicate entries

后端未结

关注

 4  2134

I\'m new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to

相关标签:

4条回答

说谎

2021-02-08 11:15
I have unfortunately stumbled upon a similar problem which I see is now old. I solved it by using this function in a different way, applying it to the two original tables, even though there were no duplicates in these. This is an example (I apologize, I am not a professional programmer):
```
import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df1=df1.drop_duplicates()

df2 = pd.DataFrame(dict2)
df2=df2.drop_duplicates()

df=pd.merge(df1,df2)
print('df1:')
print( df1 )

print('df2:')
print( df2 )

print('df:')
print( df )
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2021-02-08 11:19

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1).reset_index()
df2 = pd.DataFrame(dict2).reset_index()

df = df1.merge(df2, on = 'A')
df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)

print(df)

Output:

0 讨论(0)

悲&欢浪女

2021-02-08 11:19

did you try df.drop_duplicates() ?

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

df=pd.merge(df1,df2)
df_new=df.drop_duplicates() 
print df
print df_new

Seems that it gives the results that you want

0 讨论(0)

谎友^

2021-02-08 11:30

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]

df1.merge(df2).drop('index', 1, inplace = True)

The idea is to merge based on the matching indices as well as matching 'A' column values.
Previously, since the way merge works depends on matches, what happened is that the first 2 in df1 was matched to both the first and second 2 in df2, and the second 2 in df1 was matched to both the first and second 2 in df2 as well.

If you try this, you will see what I am talking about.

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]

df1.merge(df2, on = 'A')

0 讨论(0)