How to do an upsert with SqlAlchemy?

前端 未结 7 1491
别跟我提以往
别跟我提以往 2020-12-08 01:37

I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state

相关标签:
7条回答
  • 2020-12-08 02:07

    SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs.

    It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key.

    However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer.

    That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.

    0 讨论(0)
  • 2020-12-08 02:09

    SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing().

    Copying from the documentation:

    from sqlalchemy.dialects.postgresql import insert
    
    stmt = insert(my_table).values(user_email='a@b.com', data='inserted data')
    stmt = stmt.on_conflict_do_update(
        index_elements=[my_table.c.user_email],
        index_where=my_table.c.user_email.like('%@gmail.com'),
        set_=dict(data=stmt.excluded.data)
    )
    conn.execute(stmt)
    
    0 讨论(0)
  • 2020-12-08 02:10

    This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints.

        try:
            t = self._meta.tables[data['table']]
        except KeyError:
            self._log.error('table "%s" unknown', data['table'])
            return
    
        try:
            q = insert(t, values=data['values'])
            self._log.debug(q)
            self._db.execute(q)
        except IntegrityError:
            self._log.warning('integrity error')
            where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
            update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
            q = update(t, values=update_dict).where(*where_clause)
            self._log.debug(q)
            self._db.execute(q)
        except Exception as e:
            self._log.error('%s: %s', t.name, e)
    
    0 讨论(0)
  • 2020-12-08 02:16

    I use a "look before you leap" approach:

    # first get the object from the database if it exists
    # we're guaranteed to only get one or zero results
    # because we're filtering by primary key
    switch_command = session.query(Switch_Command).\
        filter(Switch_Command.switch_id == switch.id).\
        filter(Switch_Command.command_id == command.id).first()
    
    # If we didn't get anything, make one
    if not switch_command:
        switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
    
    # update the stuff we care about
    switch_command.output = 'Hooray!'
    switch_command.lastseen = datetime.datetime.utcnow()
    
    session.add(switch_command)
    # This will generate either an INSERT or UPDATE
    # depending on whether we have a new object or not
    session.commit()
    

    The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following:

    • we query the db for a switch_command and don't find one
    • we create a switch_command
    • another process or thread creates a switch_command with the same primary key as ours
    • we try to commit our switch_command
    0 讨论(0)
  • 2020-12-08 02:24

    Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core.

    Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting.

    My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting:

    For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion.

    I'm using this following function I wrote to handle both of those issues:

    def upsert(session, model, rows):
        table = model.__table__
        stmt = postgresql.insert(table)
        primary_keys = [key.name for key in inspect(table).primary_key]
        update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
    
        if not update_dict:
            raise ValueError("insert_or_update resulted in an empty update_dict")
    
        stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
                                          set_=update_dict)
    
        seen = set()
        foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
        unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
        def handle_foreignkeys_constraints(row):
            for c_name, c_value in foreign_keys.items():
                foreign_obj = row.pop(c_value.table.name, None)
                row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
    
            for const in unique_constraints:
                unique = tuple([const,] + [row[col.name] for col in const.columns])
                if unique in seen:
                    return None
                seen.add(unique)
    
            return row
    
        rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
        session.execute(stmt, rows)
    
    0 讨论(0)
  • 2020-12-08 02:24

    This allows access to the underlying models based on string names

    def get_class_by_tablename(tablename):
      """Return class reference mapped to table.
      https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
      :param tablename: String with name of table.
      :return: Class reference or None.
      """
      for c in Base._decl_class_registry.values():
        if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
          return c
    
    
    sqla_tbl = get_class_by_tablename(table_name)
    
    def handle_upsert(record_dict, table):
        """
        handles updates when there are primary key conflicts
    
        """
        try:
            self.active_session().add(table(**record_dict))
        except:
            # Here we'll assume the error is caused by an integrity error
            # We do this because the error classes are passed from the
            # underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
            # them with it's own code - this should be updated to have
            # explicit error handling for each new db engine
    
            # <update>add explicit error handling for each db engine</update> 
            active_session.rollback()
            # Query for conflic class, use update method to change values based on dict
            c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
            c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
    
            c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
            c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
    
            c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
    
            # apply new data values to the existing record
            for k, v in record_dict.items()
                setattr(c_target_record, k, v)
    
    0 讨论(0)
提交回复
热议问题