Pandas merge creates unwanted duplicate entries

后端 未结 4 2134
天涯浪人
天涯浪人 2021-02-08 10:45

I\'m new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to

相关标签:
4条回答
  • 2021-02-08 11:15

    I have unfortunately stumbled upon a similar problem which I see is now old. I solved it by using this function in a different way, applying it to the two original tables, even though there were no duplicates in these. This is an example (I apologize, I am not a professional programmer):

    import pandas as pd
    
    dict1 = {'A':[2,2,3,4,5]}
    dict2 = {'A':[2,2,3,4,5]}
    
    df1 = pd.DataFrame(dict1)
    df1=df1.drop_duplicates()
    
    df2 = pd.DataFrame(dict2)
    df2=df2.drop_duplicates()
    
    df=pd.merge(df1,df2)
    print('df1:')
    print( df1 )
    
    print('df2:')
    print( df2 )
    
    print('df:')
    print( df )
    
    0 讨论(0)
  • 2021-02-08 11:19
    import pandas as pd
    
    dict1 = {'A':[2,2,3,4,5]}
    dict2 = {'A':[2,2,3,4,5]}
    
    df1 = pd.DataFrame(dict1).reset_index()
    df2 = pd.DataFrame(dict2).reset_index()
    
    df = df1.merge(df2, on = 'A')
    df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)
    
    print(df)
    

    Output:

       A
    0  2
    1  2
    2  3
    3  4
    4  5
    
    0 讨论(0)
  • 2021-02-08 11:19

    did you try df.drop_duplicates() ?

    import pandas as pd
    
    dict1 = {'A':[2,2,3,4,5]}
    dict2 = {'A':[2,2,3,4,5]}
    
    df1 = pd.DataFrame(dict1)
    df2 = pd.DataFrame(dict2)
    
    df=pd.merge(df1,df2)
    df_new=df.drop_duplicates() 
    print df
    print df_new
    

    Seems that it gives the results that you want

    0 讨论(0)
  • 2021-02-08 11:30
    dict1 = {'A':[2,2,3,4,5]}
    dict2 = {'A':[2,2,3,4,5]}
    
    df1 = pd.DataFrame(dict1)
    df1['index'] = [i for i in range(len(df1))]
    df2 = pd.DataFrame(dict2)
    df2['index'] = [i for i in range(len(df2))]
    
    df1.merge(df2).drop('index', 1, inplace = True)
    

    The idea is to merge based on the matching indices as well as matching 'A' column values.
    Previously, since the way merge works depends on matches, what happened is that the first 2 in df1 was matched to both the first and second 2 in df2, and the second 2 in df1 was matched to both the first and second 2 in df2 as well.

    If you try this, you will see what I am talking about.

    dict1 = {'A':[2,2,3,4,5]}
    dict2 = {'A':[2,2,3,4,5]}
    
    df1 = pd.DataFrame(dict1)
    df1['index'] = [i for i in range(len(df1))]
    df2 = pd.DataFrame(dict2)
    df2['index'] = [i for i in range(len(df2))]
    
    df1.merge(df2, on = 'A')
    
    0 讨论(0)
提交回复
热议问题