问题
I'm doing a python script to insert several data into a postgresql database.
Following the postgresql documentation in order to speed up the load process my script has this sort of structure
- Connect to the database and create a cursor
- Drop all the indexes
- Load all the data using the 'copy' command
- Recreate back all the indexes
- Commit and close of the cursor and connection (only commit in the whole script)
So my question is: Is dropping the indexes before the commit, taking any effect in terms of speeding up the loading, or not ?
回答1:
commit
just commits any ongoing transaction in progress to your database.
What you actually are asking if whether dropping indexes and then copying within the same transaction will provide the same speedup as first dropping indexes in one transaction and then copying data in a new transaction.
The direct quote from docs says that:
If you are adding large amounts of data to an existing table, it might be a win to drop the indexes, load the table, and then recreate the indexes. Of course, the database performance for other users might suffer during the time the indexes are missing. One should also think twice before dropping a unique index, since the error checking afforded by the unique constraint will be lost while the index is missing.
The bolded part indirectly tells that you should commit after dropping the indexes, as dropping indexes without committing (completing the transaction) should not have any impact on other users of the database.
So the solution should be something along these lines:
drop your indexes, commit, copy the data, create new indexes and commit again.
Note that as you split your transaction into two transactions, you lose atomicity. I.e. it's possible that your indexes are dropped, but no data is copied (if power or network for example is lost during the copying transaction) and the indexes would never be recreated.
来源:https://stackoverflow.com/questions/44902428/to-drop-an-index-with-psycopg2-takes-effect-before-or-after-commit