Trying to merge 2 dataframes but get ValueError

前端 未结 7 1106
别那么骄傲
别那么骄傲 2020-11-30 04:15

These are my two dataframes saved in two variables:

> print(df.head())
>
          club_name  tr_jan  tr_dec  year
    0  ADO Den Haag    1368    1422          


        
相关标签:
7条回答
  • 2020-11-30 04:50

    In one of your dataframes the year is a string and the other it is an int64 you can convert it first and then join (e.g. df['year']=df['year'].astype(int) or as RafaelC suggested df.year.astype(int))

    Edit: Also note the comment by Anderson Zhu: Just in case you have None or missing values in one of your dataframes, you need to use Int64 instead of int. See the reference here.

    0 讨论(0)
  • 2020-11-30 05:00

    Additional: when you save df to .csv format, the datetime (year in this specific case) is saved as object, so you need to convert it into integer (year in this specific case) when you do the merge. That is why when you upload both df from csv files, you can do the merge easily, while above error will show up if one df is uploaded from csv files and the other is from an existing df. This is somewhat annoying, but have an easy solution if kept in mind.

    0 讨论(0)
  • 2020-11-30 05:02

    this simple solution works for me

        final = pd.concat([df, rankingdf], axis=1, sort=False)
    

    but you may need to drop some duplicate column first.

    0 讨论(0)
  • 2020-11-30 05:05

    It happens when common column in both table are of different data type.

    Example: In table1, you have date as string whereas in table2 you have date as datetime. so before merging,we need to change date to common data type.

    0 讨论(0)
  • 2020-11-30 05:06

    I found that my dfs both had the same type column (str) but switching from join to merge solved the issue.

    0 讨论(0)
  • 2020-11-30 05:11

    @Arnon Rotem-Gal-Oz answer is right for the most part. But I would like to point out the difference between df['year']=df['year'].astype(int) and df.year.astype(int). df.year.astype(int) returns a view of the dataframe and doesn't not explicitly change the type, atleast in pandas 0.24.2. df['year']=df['year'].astype(int) explicitly change the type because it's an assignment. I would argue that this is the safest way to permanently change the dtype of a column.

    Example:

    df = pd.DataFrame({'Weed': ['green crack', 'northern lights', 'girl scout cookies'], 'Qty':[10,15,3]}) df.dtypes

    Weed object, Qty int64

    df['Qty'].astype(str) df.dtypes

    Weed object, Qty int64

    Even setting the inplace arg to True doesn't help at times. I don't know why this happens though. In most cases inplace=True equals an explicit assignment.

    df['Qty'].astype(str, inplace = True) df.dtypes

    Weed object, Qty int64

    Now the assignment,

    df['Qty'] = df['Qty'].astype(str) df.dtypes

    Weed object, Qty object

    0 讨论(0)
提交回复
热议问题