We run Postgres 9.0 on Windows 2008 Server. There is a large table contains a bytea
column for storing binary data ranging from 0-5MB in each row:
At least on 9.3, PostgreSQL does not rewrite fields stored out of line in TOAST
tables if they are stored out of line. I don't know if that's true in 9.0.
You can see what storage is used for a column with \d+ tablename
; the storage
column shows the mode used. Individual tuples may be stored compressed in-line if they're small enough (ex: < 2K), even in an extended
storage column where tuples are eligible for out of line storage.
See the documentation for TOAST and ALTER TABLE ... SET STORAGE
.
Temp files are stored in the temp_tablespaces
. By default this is empty, in which case it falls back to default_tablespace
, which in turn if empty falls back to the pg_default
tablespace.
Space within tables/indexes should be freed for re-use automatically by autovacuum. Make sure your autovacuum daemon is running often enough and doesn't have too much of a cost_delay set. Autovacuum has been significantly improved since 9.0.
If you want to free space back to the operating system or for use in other tables, you'll need to VACUUM FULL
or use an external tool like pg_repack to do it in a less intrusive manner.
Picking c) from your questions:
Is there a way we can perform minor boolean field updates to this table WITHOUT rewriting the row (&chewing up diskspace) each time?
As @Craig already explained, columns that are "TOAST-able" and bigger than a certain threshold are stored out-of-line in a dedicated TOAST table per table (separate "relation forks", separate files on disk). So, a 5 MB bytea
column would stay mostly untouched in an update if the column itself is not changed. The manual:
During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.
Bold emphasis mine.
The row in the main relation fork is still copied and a dead row stays behind when updated (whether or not any values actually changed). For large row sizes, the following solution might pay:
Create a small separate 1:1 table for frequently changed flags. Just the primary key (= foreign key at the same time) and the frequently changed flags. This would make updates a lot faster and preserve disk space - for an initial extra overhead and some cost for queries that need to join both tables (other queries actually get faster). More about on-disk space requirement of table rows: