问题
I am trying to take a backup of a TimescaleDB database, excluding two very big hypertables.
That means that while the backup is running, I would not expect to see any COPY
command of the underlying chunks, but I actually do!
Let's say TestDB
is my database and it has two big hypertables on schema mySchema
called hyper1
and hyper2
, as well as other normal tables.
I run the following command:
pg_dump -U user -F t TestDB --exclude-table "mySchema.hyper1" --exclude-table "mySchema.hyper2" > TestDB_Backup.tar
Then I check the running queries (esp. because I did not expect it to take this long) and I find out that several COPY commands are running, for each chunk of the tables I actually excluded.
This is TimescaleDB
version 1.7.4
.
Did this ever happen to any of you and what is actually going on here?
ps. I am sorry I cannot really provide a repro for this and that this is more of a discussion than an actual programmatic problem, but I still hope someone has seen this before and can show me what am I missing :)
回答1:
pg_dump
dumps each child table separately and independently from their parents, thus when you exclude a hypertable, its chunk tables will be still dumped. Thus you observe all chunk tables are still dumped.
Note that excluding hypertables and chunks will not work to restore the dump correctly into a TimescaleDB instance, since TimescaleDB metadata will not match actual state of the database. TimescaleDB maintains catalog tables with information about hypertables and chunks and they are just another user tables for pg_dump
, so it will dump them (which is important), but when they are restored they will contain all hypertables and chunks, which was in the database before the dump.
So you need to exclude data from the tables you want to exclude (not hypertables or chunks themselves), which will reduce dump and restore time. Then it will be necessary to drop the excluded hypertables after the restore. You exclude table data with pg_dump
parameter --exclude-table-data
. There is an issue in TimescaleDB GitHub repo, which discusses how to exclude hypertable data from a dump. The issue suggests how to generate the exclude string:
SELECT string_agg(format($$--exclude-table-data='%s.%s'$$,coalesce(cc.schema_name,c.schema_name), coalesce(cc.table_name, c.table_name)), ' ')
FROM _timescaledb_catalog.hypertable h
INNER JOIN _timescaledb_catalog.chunk c on c.hypertable_id = h.id
LEFT JOIN _timescaledb_catalog.chunk cc on c.compressed_chunk_id = cc.id
WHERE h.schema_name = <foo> AND h.table_name = <bar> ;
Alternatively, you can find hypertable_id
and exclude data from all chunk tables prefixed with the hypertable id. Find hypertable_id
from catalog table _timescaledb_catalog.hypertable
:
SELECT id
FROM _timescaledb_catalog.hypertable
WHERE schema_name = 'mySchema' AND table_name = 'hyper1';
Let's say that the id is 2. Then dump the database according the instructions:
pg_dump -U user -Fc -f TestDB_Backup.bak \
--exclude-table-data='_timescaledb_internal._hyper_2*' TestDB
来源:https://stackoverflow.com/questions/64994738/pg-dump-with-exclude-table-still-includes-those-tables-in-the-background-copy