String field length in Postgres SQL

后端 未结 3 2000
半阙折子戏
半阙折子戏 2021-01-15 04:55

I have a string filed in an SQL database, representing a url. Some url\'s are short, and some very long. I don\'t really know waht\'s the longest URL I might encounter, so t

3条回答
  •  别那么骄傲
    2021-01-15 05:43

    In PostgreSQL character(n) is basically just varchar with space padding on input/output. It's clumsy and should be avoided. It consumes the same storage as a varchar or text field that's been padded out to the maximum length (see below). char(n) is a historical wart, and should be avoided - at least in PostgreSQL it offers no advantages and has some weird quirks with things like left(...).

    varchar(n), varchar and text all consume the same storage - the length of the string you supplied with no padding. It only uses the storage actually required for the characters, irrespective of the length limit. Also, if the string is null, PostgreSQL doesn't store a value for it at all (not even a length header), it just sets the null bit in the record's null bitmap.

    Qualified varchar(n) is basically the same as unqualified varchar with a check constraint on length(colname) < n.

    Despite what some other comments/answers are saying, char(n), varchar, varchar(n) and text are all TOASTable types. They can all be stored out of line and/or compressed. To control storage use ALTER TABLE ... ALTER COLUMN ... SET STORAGE.

    If you don't know the max length you'll need, just use text or unqualified varchar. There's no space penalty.

    For more detail see the documentation on character data types, and for some of the innards on how they're stored, see database physical storage in particular TOAST.

    Demo:

    CREATE TABLE somechars(c10 char(10), vc10 varchar(10), vc varchar, t text);
    insert into somechars(c10) values ('  abcdef ');
    insert into somechars(vc10) values ('  abcdef ');
    insert into somechars(vc) values ('  abcdef ');
    insert into somechars(t) values ('  abcdef ');
    

    Output of this query for each col:

    SELECT 'c10', pg_column_size(c10), octet_length(c10), length(c10) 
    from somechars where c10 is not null;
    

    is:

     ?column? | pg_column_size | octet_length | length 
     c10      |             11 |           10 |      8
     vc10     |             10 |            9 |      9
     vc       |             10 |            9 |      9
     t        |             10 |            9 |      9
    

    pg_column_size is the on-disk size of the datum in the field. octet_length is the uncompressed size without headers. length is the "logical" string length.

    So as you can see, the char field is padded. It wastes space and it also gives what should be a very surprising result for length given that the input was 9 chars, not 8. That's because Pg can't tell the difference between leading spaces you put in yourself, and leading spaces it added as padding.

    So, don't use char(n).

    BTW, if I'm designing a database I never use varchar(n) or char(n). I just use the text type and add appropriate check constraints if there are application requirements for the values. I think that varchar(n) is a bit of a wart in the standard, though I guess it's useful for DBs that have on-disk layouts where the size limit might affect storage.

提交回复
热议问题