Multi-row UPSERT (INSERT or UPDATE) from Python

前端 未结 3 929
我寻月下人不归
我寻月下人不归 2021-01-12 19:12

I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:

import pyodbc

table_name = \'my_table\'
insert_         


        
相关标签:
3条回答
  • 2021-01-12 19:23

    Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.

    def append(df, c):
    
    
        table_name = 'ddf.ddf_actuals'
    
    
        columns_list = df.columns.tolist()
        columns_list_query = f'({(",".join(columns_list))})'
        sr_columns_list = [f'Source.{i}' for i in columns_list]
        sr_columns_list_query = f'({(",".join(sr_columns_list))})'
        up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
        up_columns_list_query = f'{",".join(up_columns_list)}'
    
        rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
        rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
    
    
        query = f"MERGE INTO {table_name} as Target \
    USING (SELECT * FROM \
    (VALUES {rows_to_insert}) \
    AS s {columns_list_query}\
    ) AS Source \
    ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
    WHEN NOT MATCHED THEN \
    INSERT {columns_list_query} VALUES {sr_columns_list_query} \
    WHEN MATCHED THEN \
    UPDATE SET {up_columns_list_query};"
        c.execute(query)
    
        c.commit()
    
    
    0 讨论(0)
  • 2021-01-12 19:34

    This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:

    MERGE INTO MyTable as Target
    USING (SELECT * FROM 
           (VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5)) 
           AS s (ID, col_a, col_b)
          ) AS Source
    ON Target.ID=Source.ID
    WHEN NOT MATCHED THEN
    INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
    WHEN MATCHED THEN
    UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
    

    You can give it a try on rextester.com/IONFW62765.

    Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).

    In python with pyodbc, it should probably look like this:

    import pyodbc
    
    insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
    table_name = 'my_table'
    key_col = 'ID'
    col_a = 'col_a'
    col_b = 'col_b'
    
    cnxn = pyodbc.connect(...)
    cursor = cnxn.cursor()
    cursor.execute(('MERGE INTO {table_name} as Target '
                    'USING (SELECT * FROM '
                    '(VALUES {vals}) '
                    'AS s ({k}, {a}, {b}) '
                    ') AS Source '
                    'ON Target.ID=Source.ID '
                    'WHEN NOT MATCHED THEN '
                    'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
                    'WHEN MATCHED THEN '
                    'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
                    .format(table_name=table_name,
                            vals=','.join([str(i) for i in insert_values]),
                            k=key_col,
                            a=col_a,
                            b=col_b)))
    cursor.commit()
    

    You can read up more on MERGE in the SQL Server docs.

    0 讨论(0)
  • 2021-01-12 19:40

    Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:

    # assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
    # also assumes your dataframe columns have all the same names as the existing table
    
    table_name_to_update = 'update_table'
    table_name_to_transfer = 'placeholder_table'
    
    # the dataframe and existing table should both have a column to use as the primary key
    primary_key_col = 'id'
    
    # replace the placeholder table with the dataframe
    df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
    
    # building the command terms
    cols_list = df.columns.tolist()
    cols_list_query = f'({(", ".join(cols_list))})'
    sr_cols_list = [f'Source.{i}' for i in cols_list]
    sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
    up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
    up_cols_list_query = f'{", ".join(up_cols_list)}'
        
    # fill values that should be interpreted as "NULL" with None
    def fill_null(vals: list) -> list:
        def bad(val):
            if isinstance(val, type(pd.NA)):
                return True
            # the list of values you want to interpret as 'NULL' should be 
            # tweaked to your needs
            return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
        return tuple(i if not bad(i) else None for i in vals)
    
    # create the list of parameter indicators (?, ?, ?, etc...)
    # and the parameters, which are the values to be inserted
    params = [fill_null(row.tolist()) for _, row in df.iterrows()]
    param_slots = '('+', '.join(['?']*len(df.columns))+')'
        
    cmd = f'''
           MERGE INTO {table_name_to_update} as Target
           USING (SELECT * FROM
           (VALUES {param_slots})
           AS s {cols_list_query}
           ) AS Source
           ON Target.{primary_key_col}=Source.{primary_key_col}
           WHEN NOT MATCHED THEN
           INSERT {cols_list_query} VALUES {sr_cols_list_query} 
           WHEN MATCHED THEN
           UPDATE SET {up_cols_list_query};
           '''
    
    # execute the command to merge tables
    with engine.begin() as conn:
        conn.execute(cmd, params)
    

    This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).

    For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:

    import urllib
    import pyodbc
    pyodbc.pooling = False
    import sqlalchemy
    
    terms = urllib.parse.quote_plus(
                'DRIVER={SQL Server Native Client 11.0};'
                'SERVER=<your server>;'
                'DATABASE=<your database>;'
                'Trusted_Connection=yes;' # to logon using Windows credentials
    
    url = f'mssql+pyodbc:///?odbc_connect={terms}'
    engine = sqlalchemy.create_engine(url, fast_executemany=True)
    

    EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.

    0 讨论(0)
提交回复
热议问题