pg_dump with --exclude-table still includes those tables in the background COPY commands it runs?

只愿长相守 提交于 2021-01-29 13:52:46

问题


I am trying to take a backup of a TimescaleDB database, excluding two very big hypertables. That means that while the backup is running, I would not expect to see any COPY command of the underlying chunks, but I actually do!

Let's say TestDB is my database and it has two big hypertables on schema mySchema called hyper1 and hyper2, as well as other normal tables.

I run the following command:

pg_dump -U user -F t TestDB --exclude-table "mySchema.hyper1" --exclude-table "mySchema.hyper2"  > TestDB_Backup.tar

Then I check the running queries (esp. because I did not expect it to take this long) and I find out that several COPY commands are running, for each chunk of the tables I actually excluded.

This is TimescaleDB version 1.7.4.

Did this ever happen to any of you and what is actually going on here?

ps. I am sorry I cannot really provide a repro for this and that this is more of a discussion than an actual programmatic problem, but I still hope someone has seen this before and can show me what am I missing :)


回答1:


pg_dump dumps each child table separately and independently from their parents, thus when you exclude a hypertable, its chunk tables will be still dumped. Thus you observe all chunk tables are still dumped.

Note that excluding hypertables and chunks will not work to restore the dump correctly into a TimescaleDB instance, since TimescaleDB metadata will not match actual state of the database. TimescaleDB maintains catalog tables with information about hypertables and chunks and they are just another user tables for pg_dump, so it will dump them (which is important), but when they are restored they will contain all hypertables and chunks, which was in the database before the dump.

So you need to exclude data from the tables you want to exclude (not hypertables or chunks themselves), which will reduce dump and restore time. Then it will be necessary to drop the excluded hypertables after the restore. You exclude table data with pg_dump parameter --exclude-table-data. There is an issue in TimescaleDB GitHub repo, which discusses how to exclude hypertable data from a dump. The issue suggests how to generate the exclude string:

SELECT string_agg(format($$--exclude-table-data='%s.%s'$$,coalesce(cc.schema_name,c.schema_name), coalesce(cc.table_name, c.table_name)), ' ')
FROM _timescaledb_catalog.hypertable h 
  INNER JOIN _timescaledb_catalog.chunk c on c.hypertable_id = h.id 
  LEFT JOIN _timescaledb_catalog.chunk cc on c.compressed_chunk_id = cc.id
WHERE h.schema_name = <foo> AND h.table_name = <bar> ;

Alternatively, you can find hypertable_id and exclude data from all chunk tables prefixed with the hypertable id. Find hypertable_id from catalog table _timescaledb_catalog.hypertable:

SELECT id
FROM _timescaledb_catalog.hypertable
WHERE schema_name = 'mySchema' AND table_name = 'hyper1';

Let's say that the id is 2. Then dump the database according the instructions:

pg_dump -U user -Fc -f TestDB_Backup.bak \
  --exclude-table-data='_timescaledb_internal._hyper_2*' TestDB 


来源:https://stackoverflow.com/questions/64994738/pg-dump-with-exclude-table-still-includes-those-tables-in-the-background-copy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!