I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.
I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126
I have following assumptions:
- I have 30-100 tables. Let them be 30.
- Half of the tables are empty.
- Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
I need an optional possibility to exclude 2 or 5 or N tables from this procedure.
I cannot! use transactions.
I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
I see the following approaches:
Truncate each table. It is too slow, I think, especially for empty tables.
Check each table for emptiness by more faster method, and then if it is empty, reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence back to 1, otherwise run truncate on it.
I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:
DO $$DECLARE r record;
BEGIN
somehow_captured = SELECT last_value from #{table}_id_seq
IF (somehow_captured == 1) THEN
== restore initial unique identifier column value here ==
END
IF (somehow_captured > 1) THEN
TRUNCATE TABLE #{table};
END IF;
END$$;
Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).
Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.
I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.
UPDATE:
I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.
UPDATE 2:
The strategy described here recently was merged into DatabaseCleaner, https://github.com/bmabey/database_cleaner as :pre_count option (see README there).
PostgreSQL can truncate many tables in a single TRUNCATE TABLE
statement.
Don't bother iterating and just do
TRUNCATE TABLE table1,table2,table3,...,table30;
See also:
for discussion of why truncation can be slower on Pg, and why DELETE isn't the same thing.
As requested in the comment
(although I don't feel this is the right answer - but it's too long for a comment)
There is no (noticable) performance difference between truncating an empty table or truncating a large table.
As stated in the manual (http://www.postgresql.org/docs/current/static/sql-truncate.html) "it does not actually scan the tables"
So if you first check if there are any rows in the table, you will scan the table. Something that won't happen if you simply issue the truncate
without bothering whether
[I don't know RoR]
A nice way to start with a clean slate is to create and use a temp SCHEMA:
DROP SCHEMA fuzz CASCADE;
CREATE SCHEMA fuzz;
SET search_path='fuzz';
(this is what I use to test sql snippets). But this would create an empty schema, and you cannot copy schemas, IFAIK.
The other way would be to create your database (including the empty tables) and use that as a template for constructing the test-rig:
DROP DATABASE testdb;
CREATE DATABASE testdb TEMPLATE my_spcial_template;
The problem with that is, that you cannot drop the database if there are still connections to it (such as the drop-process itself) So, your front-end should first disconnect, than temporarily connect to some other DB (such as my_spcial_template
) , than dropdb+createdb, than connect testdb. I don't know about performance, but at least it is a robust scheme.
If someone is interested with the current strategy, I use for this, see this Ruby-based repo https://github.com/stanislaw/truncate-vs-count for both MySQL and PostgreSQL.
My results:
MySQL: the fastest strategy for cleaning databases is truncation with following modifications:
if table is not empty
truncate.
else
if AUTO_INCREMENT is not 0
truncate.
end
end
- For MySQL just truncation is much faster than just deletion. The only case where DELETE wins over TRUNCATE is doing it on empty table.
- For MySQL truncation with empty checks is much faster than just multiple truncation.
- For MySQL deletion with empty checks is much faster than just DELETE on each tables.
PostgreSQL: The fastest strategy for cleaning databases is deletion with the same empty-checks as for MySQL, but with relying on currval instead:
if table is not empty
delete table
else
if currval is not 0
delete table
end
end
- For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
- For PostgreSQL multiple TRUNCATE doing empty checks before is slightly faster than just multiple TRUNCATE
- For PostgreSQL deletion with empty checks is slightly faster than just PostgreSQL deletion.
This is from where it began: https://github.com/bmabey/database_cleaner/issues/126
This is the result code and long discussion: https://github.com/bmabey/database_cleaner/pull/127
This is discussion on pgsql-performance mailing list: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php
We began collecting users feedback proving my idea with first checking empty tables is right.
来源:https://stackoverflow.com/questions/11311079/30-tables-with-few-rows-truncate-the-fastest-way-to-empty-them-and-reset-attac