PostgresQL Automating VACUUM FULL for bloated tables

前端 未结 2 1765
南方客
南方客 2021-02-09 05:05

We have a product using PostgreSQL database server that is deployed at a couple of hundred clients. Some of them have gathered tens of gigabytes of data over th

相关标签:
2条回答
  • 2021-02-09 05:11

    You probably don't need it. It is good to do this once — after first archiving job so you'll get your disk space back but after that your daily archiving job and autovacuum will prevent dead tuples bloat.

    Also instead of vacuum full it is often better to run cluster table_name using index_name; analyze table_name. This will reorder rows according to an index. This way related table rows can be saved physically close on disk, which can limit disk seeking (important on classic disk drives, largely irrelevant on SSD) and a number of reads for your typical queries.

    And remember that both vacuum full and cluster will make your tables unusable while they run.

    0 讨论(0)
  • 2021-02-09 05:14

    OK, I worked my way through it.

    I simplified/reworked the view to split it up in the following two:

    CREATE OR REPLACE VIEW
        bloat_datawidth AS
    SELECT
        ns.nspname AS schemaname,
        tbl.oid   AS relid,
        tbl.relname,
        CASE
            WHEN every(avg_width IS NOT NULL)
            THEN SUM((1-null_frac)*avg_width) + MAX(null_frac) * 24
            ELSE NULL
        END AS datawidth
    FROM
        pg_attribute att
    JOIN
        pg_class tbl
    ON
        att.attrelid = tbl.oid
    JOIN
        pg_namespace ns
    ON
        ns.oid = tbl.relnamespace
    LEFT JOIN
        pg_stats s
    ON
        s.schemaname=ns.nspname
    AND s.tablename = tbl.relname
    AND s.inherited=false
    AND s.attname=att.attname
    WHERE
        att.attnum > 0
    AND tbl.relkind='r'
    GROUP BY
        1,2,3;
    

    And

    CREATE OR REPLACE VIEW
        bloat_tables AS
    SELECT
        bdw.schemaname,
        bdw.relname,
        bdw.datawidth,
        cc.reltuples::bigint,
        cc.relpages::bigint,
        ceil(cc.reltuples*bdw.datawidth/current_setting('block_size')::NUMERIC)::bigint AS expectedpages,
        100 - (cc.reltuples*100*bdw.datawidth)/(current_setting('block_size')::NUMERIC*cc.relpages) AS bloatpct
    FROM
        bloat_datawidth bdw
    JOIN
        pg_class cc
    ON
        cc.oid = bdw.relid
    AND cc.relpages > 1
    AND bdw.datawidth IS NOT NULL;
    

    And the cron job:

    #!/bin/bash
    
    MIN_BLOAT=65
    MIN_WASTED_PAGES=100
    LOG_FILE=/var/log/postgresql/bloat.log
    DATABASE=unity-stationmaster
    SCHEMA=public
    
    if [[ "$(id -un)" != "postgres" ]]
    then
    echo "You need to be user postgres to run this script."
    exit 1
    fi
    
    TABLENAME=`psql $DATABASE -t -A -c "select relname from bloat_tables where bloatpct > $MIN_BLOAT and relpages-expectedpages > $MIN_WASTED_PAGES and schemaname ='$SCHEMA' order by wastedpages desc limit 1"`
    
    if [[ -z "$TABLENAME" ]]
    then
    echo "No bloated tables." >> $LOG_FILE
    exit 0
    fi
    
    vacuumdb -v -f -t $TABLENAME $DATABASE >> $LOG_FILE
    
    0 讨论(0)
提交回复
热议问题