Pandas Join on String Datatype

后端 未结 1 1364
孤城傲影
孤城傲影 2020-12-18 01:44

I am trying to join two pandas dataframes on an id field which is a string uuid. I get a Value error:

ValueError: You are trying to merge on object and int64 column

相关标签:
1条回答
  • 2020-12-18 02:32

    The on parameter only applies to the calling DataFrame!

    on: Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index.

    Though you specify on='id' it will use the 'id' in pdf, which is an object and attempt to join that with the index of outputsPdf, which takes integer values.

    If you need to join on non-index columns across two DataFrames you can either set them to the index, or you must use merge as the on paremeter in pd.merge applies to both DataFrames.


    Example

    import pandas as pd
    
    df1 = pd.DataFrame({'id': ['1', 'True', '4'], 'vals': [10, 11, 12]})
    df2 = df1.copy()
    
    df1.join(df2, on='id', how='left', rsuffix='_fs')
    

    ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

    On the other hand, these work:

    df1.set_index('id').join(df2.set_index('id'), how='left', rsuffix='_fs').reset_index()
    #     id  vals  vals_fs
    #0     1    10       10
    #1  True    11       11
    #2     4    12       12
    
    df1.merge(df2, on='id', how='left', suffixes=['', '_fs'])
    #     id  vals  vals_fs
    #0     1    10       10
    #1  True    11       11
    #2     4    12       12
    
    0 讨论(0)
提交回复
热议问题