I'm using COPY
to insert large batches of data into our database from CSVs. The insert looks something like this:
-- This tmp table will contain all the items that we want to try to insert
CREATE TEMP TABLE tmp_items
(
field1 INTEGER NULL,
field2 INTEGER NULL,
...
) ON COMMIT DROP;
COPY tmp_items(
field1,
field2,
...
) FROM 'path\to\data.csv' WITH (FORMAT csv);
-- Start inserting some items
WITH newitems AS (
INSERT INTO items (field1, field2)
SELECT tmpi.field1, tmpi,field2
FROM tmp_items tmpi
WHERE some condition
-- Return the new id and other fields to the next step
RETURNING id AS newid, field1 AS field1
)
-- Insert the result into another temp table
INSERT INTO tmp_newitems SELECT * FROM newitems;
-- Use tmp_newitems to update other tables
etc....
When will then use the data in tmp_items
to do multiple inserts in multiple tables. We check for duplicates and manipulate the data in a few ways before inserting, so not everything in tmp_items
will be used or inserted as is. We do this by a combination of CTEs and more temporary tables.
This works very well and is fast enough for our needs. We do loads of these and the problem we have is that pg_attribute
is becoming very bloated quite fast and autovacuum doesn't seem to be able to keep up (and consumes a lot of CPU).
My questions are:
- Is it possible to perform this kind of insert without using temp tables?
- If not, should we just make autovacuum of
pg_attribute
more agressive? Won't that take up as much or more CPU?
The best solution would be that you create your temporary tables at at session start with
CREATE TEMPORARY TABLE ... (
...
) ON COMMIT DELETE ROWS;
Then the temporary tables would be kept for the duration of the session but emptied at every commit.
This will reduce the bloat of pg_attribute
considerable, and bloating shouldn't be a problem any more.
You could also join the dark side (be warned, this is unsupported):
Start PostgreSQL with
pg_ctl start -o -O
so that you can modify system catalogs.
Connect as superuser and run
UPDATE pg_catalog.pg_class SET reloptions = ARRAY['autovacuum_vacuum_cost_delay=0'] WHERE oid = 'pg_catalog.pg_attribute'::regclass;
Now autovacuum will run much more aggressively on pg_attribute
, and that will probably take care of your problem.
Mind that the setting will be gone after a major upgrade.
I know this is an old question, but somebody might find my help useful here in the future.
So we're very heavy on temp tables having >500 rps and async i\o via nodejs and thus experienced a very heavy bloating of pg_attribute because of that. All you are left with is a very aggressive vacuuming which halts performance. All answers given here do not solve this, because droping and recreating temp table bloats pg_attribute heavily and therefore one sunny morning you will find db performance dead, and pg_attribute 200+ gb while your db would be like 10gb.
So the solution is elegantly this
create temp table if not exists my_temp_table (description) on commit delete rows;
So you go on playing with temp tables, save your pg_attribute, no dark side heavy vacuuming and get desired performance.
don't forget
vacuum full pg_depend;
vacuum full pg_attribute;
Cheers :)
来源:https://stackoverflow.com/questions/50366509/temporary-tables-bloating-pg-attribute