Querying objects added to a non committed session in SQLAlchemy

后端 未结 1 1050
萌比男神i
萌比男神i 2021-01-24 06:31

So I placed this question without too much context, and got downvoted, let\'s try again...

For one, I don\'t follow the logic behind SQLAlchemy\'s session.add

1条回答
  •  被撕碎了的回忆
    2021-01-24 07:17

    The reason why flushes in bauble end up emitting a COMMIT is the line 133 in db.py where they handle their history table:

    table.insert(dict(table_name=mapper.local_table.name,
                      table_id=instance.id, values=str(row),
                      operation=operation, user=user,
                      timestamp=datetime.datetime.today())).execute()
    

    Instead of issuing the additional SQL in the event handler using the passed in transactional connection, as they should, they execute the statement itself as is, which means it ends up using the engine as the bind (found through the table's metadata). Executing using the engine has autocommit behaviour. Since bauble always uses a SingletonThreadPool, there's just one connection per thread, and so that statement ends up committing the flushed changes as well. I wonder if this bug is the reason why bauble disables autoflush...

    The fix is to change the event handling to use the transactional connection:

    class HistoryExtension(orm.MapperExtension):
        """
        HistoryExtension is a
        :class:`~sqlalchemy.orm.interfaces.MapperExtension` that is added
        to all clases that inherit from bauble.db.Base so that all
        inserts, updates, and deletes made to the mapped objects are
        recorded in the `history` table.
        """
        def _add(self, operation, mapper, connection, instance):
            """
            Add a new entry to the history table.
            """
            ...  # a ton of code here
            table = History.__table__
            stmt = table.insert(dict(table_name=mapper.local_table.name,
                                     table_id=instance.id, values=str(row),
                                     operation=operation, user=user,
                                     timestamp=datetime.datetime.today()))
            connection.execute(stmt)
    
        def after_update(self, mapper, connection, instance):
            self._add('update', mapper, connection, instance)
    
        def after_insert(self, mapper, connection, instance):
            self._add('insert', mapper, connection, instance)
    
        def after_delete(self, mapper, connection, instance):
            self._add('delete', mapper, connection, instance)
    

    It's worth a note that MapperExtension has been deprecated since version 0.7.


    Regarding your views about the session I quote "Session Basics", which you really should read through:

    In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, ...

    and "Is the Session a cache?":

    Yeee…no. It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.

    So:

    My expectation from something which reads session.query is that it queries the session...

    Your expectations are wrong. The Session handles talking to the DB – among other things.

    There's two reasons why I don't want to do session.flush before my session.query,

    • one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);

    Because your DB may do validation, have triggers, and generate values for some columns – primary keys, timestamps, and the like. The data you thought you're inserting may end up something else in the DB and the Session has absolutely no way to know about that.

    Also, why should SQLAlchemy implement a sort of an in-memory DB in itself, with its own query engine, and all the problems that come with synchronizing 2 databases? How would SQLAlchemy support all the different operations and functions of different DBs you query against? Your simple equality predicate example just scratches the surface.

    When you rollback, you roll back the DB's transaction (along with the session's unflushed changes).

    • two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.

    Caused by the event handling bug.

    0 讨论(0)
提交回复
热议问题