Pandas to_sql fails on duplicate primary key

后端 未结 6 1045
执笔经年
执笔经年 2020-12-28 18:13

I\'d like to append to an existing table, using pandas df.to_sql() function.

I set if_exists=\'append\', but my table has primary keys.

相关标签:
6条回答
  • 2020-12-28 18:20

    I had trouble where I was still getting the IntegrityError

    ...strange but I just took the above and worked it backwards:

    for i, row in df.iterrows():
        sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
        found = pd.read_sql(sql, con=Engine)
        if len(found) == 0:
            df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
    
    0 讨论(0)
  • 2020-12-28 18:24

    There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)

    for i in range(len(df)):
        try:
            df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
        except IntegrityError:
            pass #or any other action
    
    0 讨论(0)
  • 2020-12-28 18:25

    Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPAN and use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.

    CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/

    0 讨论(0)
  • 2020-12-28 18:29

    please note that the "if_exists='append'" related to the existing of the table and what to do in case the table not exists. The if_exists don't related to the content of the table. see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

    if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’ fail: If table exists, do nothing. replace: If table exists, drop it, recreate it, and insert data. append: If table exists, insert data. Create if does not exist.

    0 讨论(0)
  • 2020-12-28 18:37

    Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.

    0 讨论(0)
  • 2020-12-28 18:39

    In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.

    As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sql so if any error happens, I know more about my data and make sure I know what is going on:

    import pandas as pd
    
    
    def write_to_table(table_name, data):
        df = pd.DataFrame(data)
        # Sort by price, so we remove the duplicates after keeping the lowest only
        data.sort(key=lambda row: row['price'])
        df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
        #
        df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')
    

    So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dict for data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.

    I hope this will help someone who got almost the same as my situation.

    0 讨论(0)
提交回复
热议问题