Postgresql - performance of using array in big database

前端 未结 1 622
感情败类
感情败类 2021-01-31 10:47

Let say we have a table with 6 million records. There are 16 integer columns and few text column. It is read-only table so every integer column have an index. Every record is ar

1条回答
  •  醉梦人生
    2021-01-31 11:26

    I think you should use an elements table:

    • Postgres would be able to use statistics to predict how many rows will match before executing query, so it would be able to use the best query plan (it is more important if your data is not evenly distributed);

    • you'll be able to localize query data using CLUSTER elements USING elements_id_element_idx;

    • when Postgres 9.2 would be released then you would be able to take advantage of index only scans;

    But I've made some tests for 10M elements:

    create table elements (id_item bigint, id_element bigint);
    insert into elements
      select (random()*524288)::int, (random()*32768)::int
        from generate_series(1,10000000);
    
    \timing
    create index elements_id_item on elements(id_item);
    Time: 15470,685 ms
    create index elements_id_element on elements(id_element);
    Time: 15121,090 ms
    
    select relation, pg_size_pretty(pg_relation_size(relation))
      from (
        select unnest(array['elements','elements_id_item', 'elements_id_element'])
          as relation
      ) as _;
          relation       | pg_size_pretty 
    ---------------------+----------------
     elements            | 422 MB
     elements_id_item    | 214 MB
     elements_id_element | 214 MB
    
    
    
    create table arrays (id_item bigint, a_elements bigint[]);
    insert into arrays select array_agg(id_element) from elements group by id_item;
    
    create index arrays_a_elements_idx on arrays using gin (a_elements);
    Time: 22102,700 ms
    
    select relation, pg_size_pretty(pg_relation_size(relation))
      from (
        select unnest(array['arrays','arrays_a_elements_idx']) as relation
      ) as _;
           relation        | pg_size_pretty 
    -----------------------+----------------
     arrays                | 108 MB
     arrays_a_elements_idx | 73 MB
    

    So in the other hand arrays are smaller, and have smaller index. I'd do some 200M elements tests before making a decision.

    0 讨论(0)
提交回复
热议问题