Amazon Redshift at 100% disk usage due to VACUUM query

狂风中的少年 提交于 2019-12-09 09:42:26

问题


Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%. I then tried to stop the VACUUM by running a CANCEL query in the super user queue (you enter it by running "set query_group='superuser';") but although the query didn't raise an error, this had no effect on the vaccum query which keeps running.

What can I do?


回答1:


Apparently, currently there is not much you can do. I was on the phone with amazon support for an hours, they didn't have the tools to stop the vacuum operation. They opened a ticket about CANCEL query silently not working on VACUUM queries.

They suggested I take snapshot of the cluster (normally should take a few minutes if you have made previous snapshots), and then that I restart the cluster. It sort of worked, meaning that the vacuum stopped, and some of the disk space was cleared (600GB), but the table remained more than twice its original size. Because vacuuming it again would be too risky, I resorted to creating a deep copy of it, which should created a vacuumed copy of the table. (You can read about deep copy here - http://docs.aws.amazon.com/redshift/latest/dg/performing-a-deep-copy.html).




回答2:


I have stopped vacuum operation several times. Maybe the feature was not available that time.
Run the below query, which gives you the process id for vacuum query.

select * from stv_recents where status='Running';

Once you have process id you can run the following query to terminate the process.

select pg_terminate_backend( pid );




回答3:


Hint: Run this query: (taken from here) to see what tables you should vacuum.

Note: This will help only in the case where you want to know which tables are big, and what you can gain by vacuuming each one.

select trim(pgdb.datname) as Database,
    trim(a.name) as Table,  ((b.mbytes/part.total::decimal)*100)::decimal(5,2) as pct_of_total, b.mbytes, b.unsorted_mbytes
    from stv_tbl_perm a
    join pg_database as pgdb on pgdb.oid = a.db_id
    join (select tbl, sum(decode(unsorted, 1, 1, 0)) as unsorted_mbytes, count(*) as mbytes
    from stv_blocklist group by tbl) b on a.id=b.tbl
    join ( select sum(capacity) as  total
      from stv_partitions where part_begin=0 ) as part on 1=1
    where a.slice=0
    order by 3 desc, db_id, name;

Then vacuum table(s) with high unsorted_mbytes: VACUUM your_table;




回答4:


  1. Vacuum should be scheduled regularly, if you do vacuum on the table at daily basis, it should be very quick and won't have significant side effect;
  2. In the case you described, it would be safer to scale the cluster up to a larger configuration, then do the vacuum, and then you can scale down to original configuration. Remember that free disk space is crucial for calculations on RedShift cluster, when free disk space goes down, all read/write operations on the cluster will become very slow.


来源:https://stackoverflow.com/questions/24780972/amazon-redshift-at-100-disk-usage-due-to-vacuum-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!