Pandas to_sql() to update unique values in DB?

我是研究僧i 提交于 2019-12-07 16:45:49

问题


How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.

Is there a parameter for this?

I understand that the parameters if_exists = 'append' and if_exists = 'replace'is for the entire table - not the unique entries.

I am using: 
sqlalchemy

pandas dataframe with the following datatypes: 
    index: datetime.datetime <-- Primary Key
    float
    float
    float
    float
    integer
    string <---  Primary Key
    string<----  Primary Key

I'm stuck on this so your help is much appreciated. -Thanks


回答1:


In pandas, there is no convenient argument in to_sql to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS clause.

engine = sqlalchemy.create_engine(...)

df.to_sql(name='myTempTable', con=engine, if_exists='replace')

with engine.begin() as cn:
   sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
            SELECT t.Col1, t.Col2, t.Col3, ...
            FROM myTempTable t
            WHERE NOT EXISTS 
                (SELECT 1 FROM myFinalTable f
                 WHERE t.MatchColumn1 = f.MatchColumn1
                 AND t.MatchColumn2 = f.MatchColumn2)"""

   cn.execute(sql)

This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT and so is compliant in practically all SQL-integrated relational databases.



来源:https://stackoverflow.com/questions/52188446/pandas-to-sql-to-update-unique-values-in-db

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!