Multi-row UPSERT (INSERT or UPDATE) from Python

前端 未结 3 928
我寻月下人不归
我寻月下人不归 2021-01-12 19:12

I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:

import pyodbc

table_name = \'my_table\'
insert_         


        
3条回答
  •  鱼传尺愫
    2021-01-12 19:40

    Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:

    # assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
    # also assumes your dataframe columns have all the same names as the existing table
    
    table_name_to_update = 'update_table'
    table_name_to_transfer = 'placeholder_table'
    
    # the dataframe and existing table should both have a column to use as the primary key
    primary_key_col = 'id'
    
    # replace the placeholder table with the dataframe
    df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
    
    # building the command terms
    cols_list = df.columns.tolist()
    cols_list_query = f'({(", ".join(cols_list))})'
    sr_cols_list = [f'Source.{i}' for i in cols_list]
    sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
    up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
    up_cols_list_query = f'{", ".join(up_cols_list)}'
        
    # fill values that should be interpreted as "NULL" with None
    def fill_null(vals: list) -> list:
        def bad(val):
            if isinstance(val, type(pd.NA)):
                return True
            # the list of values you want to interpret as 'NULL' should be 
            # tweaked to your needs
            return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
        return tuple(i if not bad(i) else None for i in vals)
    
    # create the list of parameter indicators (?, ?, ?, etc...)
    # and the parameters, which are the values to be inserted
    params = [fill_null(row.tolist()) for _, row in df.iterrows()]
    param_slots = '('+', '.join(['?']*len(df.columns))+')'
        
    cmd = f'''
           MERGE INTO {table_name_to_update} as Target
           USING (SELECT * FROM
           (VALUES {param_slots})
           AS s {cols_list_query}
           ) AS Source
           ON Target.{primary_key_col}=Source.{primary_key_col}
           WHEN NOT MATCHED THEN
           INSERT {cols_list_query} VALUES {sr_cols_list_query} 
           WHEN MATCHED THEN
           UPDATE SET {up_cols_list_query};
           '''
    
    # execute the command to merge tables
    with engine.begin() as conn:
        conn.execute(cmd, params)
    

    This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).

    For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:

    import urllib
    import pyodbc
    pyodbc.pooling = False
    import sqlalchemy
    
    terms = urllib.parse.quote_plus(
                'DRIVER={SQL Server Native Client 11.0};'
                'SERVER=;'
                'DATABASE=;'
                'Trusted_Connection=yes;' # to logon using Windows credentials
    
    url = f'mssql+pyodbc:///?odbc_connect={terms}'
    engine = sqlalchemy.create_engine(url, fast_executemany=True)
    

    EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.

提交回复
热议问题