Sqlalchemy bulk update in MySQL works very slow

后端 未结 1 989
日久生厌
日久生厌 2021-01-05 20:49

I\'m using SQLAlchemy 1.0.0, and want to make some UPDATE ONLY (update if match primary key else do nothing) queries in batch.

I\'ve made s

相关标签:
1条回答
  • 2021-01-05 21:37

    You can speed up bulk update operations with a trick, even if the database-server (like in your case) has a very bad latency. Instead of updating your table directly, you use a stage-table to insert your new data very fast, then do one join-update to the destination-table. This also has the advantage that you reduce the number of statements you have to send to the database quite dramatically.

    How does this work with UPDATEs?

    Say you have a table entries and you have new data coming in all the time, but you only want to update those which have already been stored. You create a copy of your destination-table entries_stage with only the relevant fields in it:

    entries = Table('entries', metadata,
        Column('id', Integer, autoincrement=True, primary_key=True),
        Column('value', Unicode(64), nullable=False),
    )
    
    entries_stage = Table('entries_stage', metadata,
        Column('id', Integer, autoincrement=False, unique=True),
        Column('value', Unicode(64), nullable=False),
    )
    

    Then you insert your data with a bulk-insert. This can be sped up even further if you use MySQL's multiple value insert syntax, which isn't natively supported by SQLAlchemy, but can be built without much difficulty.

    INSERT INTO enries_stage (`id`, `value`)
    VALUES
    (1, 'string1'), (2, 'string2'), (3, 'string3'), ...;
    

    In the end, you update the values of the destination-table with the values from the stage-table like this:

     UPDATE entries e
     JOIN entries_stage es ON e.id = es.id
     SET e.value = es.value;
    

    Then you're done.

    What about inserts?

    This also works to speed up inserts of course. As you already have the data in the stage-table, all you need to do is issue a INSERT INTO ... SELECT statement, with the data which is not in destination-table yet.

    INSERT INTO entries (id, value)
    SELECT FROM entries_stage es
    LEFT JOIN entries e ON e.id = es.id
    HAVING e.id IS NULL;
    

    The nice thing about this is that you don't have to do INSERT IGNORE, REPLACE or ON DUPLICATE KEY UPDATE, which will increment your primary key, even if they will do nothing.

    0 讨论(0)
提交回复
热议问题