How to check a sequence efficiently for used and unused values in PostgreSQL

前端 未结 4 1732
暖寄归人
暖寄归人 2020-12-21 11:36

In PostgreSQL (9.3) I have a table defined as:

CREATE TABLE charts
( recid serial NOT NULL,
  groupid text NOT NULL,
  chart_number integer NOT NULL,
  \"tim         


        
相关标签:
4条回答
  • 2020-12-21 12:10

    In PostgreSQL, a SEQUENCE ensures the two requirements you mention, that is:

    1. No repeats
    2. No changes once assigned

    But because of how a SEQUENCE works (see manual), it can not ensure no-skips. Among others, the first two reasons that come to mind are:

    1. How a SEQUENCE handles concurrent blocks with INSERTS (you could also add that the concept of Cache also makes this impossible)
    2. Also, user triggered DELETEs are an uncontrollable aspect that a SEQUENCE can not handle by itself.

    In both cases, if you still do not want skips, (and if you really know what you're doing) you should have a separate structure that assign IDs (instead of using SEQUENCE). Basically a system that has a list of 'assignable' IDs stored in a TABLE that has a function to pop out IDs in a FIFO way. That should allow you to control DELETEs etc.

    But again, this should be attempted, only if you really know what you're doing! There's a reason why people don't do SEQUENCEs themselves. There are hard corner-cases (for e.g. concurrent INSERTs) and most probably you're over-engineering your problem case, that probably can be solved in a much better / cleaner way.

    0 讨论(0)
  • 2020-12-21 12:21

    Sequence numbers usually have no meaning, so why worry? But if you really want this, then follow the below, cumbersome procedure. Note that it is not efficient; the only efficient option is to forget about the holes and use the sequence.

    In order to avoid having to scan the charts table on every insert, you should scan the table once and store the unused chart_number values in a separate table:

    CREATE TABLE charts_unused_chart_number AS
      SELECT seq.unused
      FROM (SELECT max(chart_number) FROM charts) mx,
           generate_series(1, mx(max)) seq(unused)
      LEFT JOIN charts ON charts.chart_number = seq.unused
      WHERE charts.recid IS NULL;
    

    The above query generates a contiguous series of numbers from 1 to the current maximum chart_number value, then LEFT JOINs the charts table to it and find the records where there is no corresponding charts data, meaning that value of the series is unused as a chart_number.

    Next you create a trigger that fires on an INSERT on the charts table. In the trigger function, pick a value from the table created in the step above:

    CREATE FUNCTION pick_unused_chart_number() RETURNS trigger AS $$
    BEGIN
      -- Get an unused chart number
      SELECT unused INTO NEW.chart_number FROM charts_unused_chart_number LIMIT 1;
    
      -- If the table is empty, get one from the sequence
      IF NOT FOUND THEN
        NEW.chart_number := next_val(charts_chartnumber_seq);
      END IF;
    
      RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER tr_charts_cn
    BEFORE INSERT ON charts
    FOR EACH ROW EXECUTE PROCEDURE pick_unused_chart_number();
    

    Easy. But the INSERT may fail because of some other trigger aborting the procedure or any other reason. So you need a check to ascertain that the chart_number was indeed inserted:

    CREATE FUNCTION verify_chart_number() RETURNS trigger AS $$
    BEGIN
      -- If you get here, the INSERT was successful, so delete the chart_number
      -- from the temporary table.
      DELETE FROM charts_unused_chart_number WHERE unused = NEW.chart_number;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER tr_charts_verify
    AFTER INSERT ON charts
    FOR EACH ROW EXECUTE PROCEDURE verify_chart_number();
    

    At a certain point the table with unused chart numbers will be empty whereupon you can (1) ALTER TABLE charts to use the sequence instead of an integer for chart_number; (2) delete the two triggers; and (3) the table with unused chart numbers; all in a single transaction.

    0 讨论(0)
  • 2020-12-21 12:23

    Consider not doing it. Read these related answers first:

    • Gap-less sequence where multiple transactions with multiple tables are involved
    • Compacting a sequence in PostgreSQL

    If you still insist on filling in gaps, here is a rather efficient solution:

    1. To avoid searching large parts of the table for the next missing chart_number, create a helper table with all current gaps once:

    CREATE TABLE chart_gap AS
    SELECT chart_number
    FROM   generate_series(1, (SELECT max(chart_number) - 1  -- max is no gap
                               FROM charts)) chart_number
    LEFT   JOIN charts c USING (chart_number)
    WHERE  c.chart_number IS NULL;
    

    2. Set charts_chartnumber_seq to the current maximum and convert chart_number to an actual serial column:

    SELECT setval('charts_chartnumber_seq', max(chart_number)) FROM charts;
    
    ALTER TABLE charts
       ALTER COLUMN chart_number SET NOT NULL
     , ALTER COLUMN chart_number SET DEFAULT nextval('charts_chartnumber_seq');
    
    ALTER SEQUENCE charts_chartnumber_seq OWNED BY charts.chart_number; 
    

    Details:

    • How to reset postgres' primary key sequence when it falls out of sync?
    • Safely and cleanly rename tables that use serial primary key columns in Postgres?

    3. While chart_gap is not empty fetch the next chart_number from there. To resolve possible race conditions with concurrent transactions, without making transactions wait, use advisory locks:

    WITH sel AS (
       SELECT chart_number, ...  -- other input values
       FROM   chart_gap
       WHERE  pg_try_advisory_xact_lock(chart_number)
       LIMIT  1
       FOR    UPDATE
       )
    , ins AS (
       INSERT INTO charts (chart_number, ...) -- other target columns
       TABLE sel 
       RETURNING chart_number
       )
    DELETE FROM chart_gap c
    USING  ins i
    WHERE  i.chart_number = c.chart_number;
    

    Alternatively, Postgres 9.5 or later has the handy FOR UPDATE SKIP LOCKED to make this simpler and faster:

    ...
       SELECT chart_number, ...  -- other input values
       FROM   chart_gap
       LIMIT  1
       FOR    UPDATE SKIP LOCKED
    ...
    

    Detailed explanation:

    • Postgres UPDATE ... LIMIT 1

    Check the result. Once all rows are filled in, this returns 0 rows affected. (you could check in plpgsql with IF NOT FOUND THEN ...). Then switch to a simple INSERT:

       INSERT INTO charts (...)  -- don't list chart_number
       VALUES (...);  --  don't provide chart_number
    
    0 讨论(0)
  • 2020-12-21 12:31

    While what you want is possible, it can't be done using only a SEQUENCE and it requires an exclusive lock on the table, or a retry loop, to work.

    You'll need to:

    • LOCK thetable IN EXCLUSIVE MODE
    • Find the first free ID by querying for the max id then doing a left join over generate_series to find the first free entry. If there is one.
    • If there is a free entry, insert it.
    • If there is no free entry, call nextval and return the result.

    Performance will be absolutely horrible, and transactions will be serialized. There'll be no concurrency. Also, unless the LOCK is the first thing you run that affects that table, you'll face deadlocks that cause transaction aborts.

    You can make this less bad by using an AFTER DELETE .. FOR EACH ROW trigger that keeps track of entries you delete by INSERTing them into a one-column table that keeps track of spare IDs. You can then SELECT the lowest ID from the table in your ID assignment function on the default for the column, avoiding the need for the explicit table lock, the left join on generate_series and the max call. Transactions will still be serialized on a lock on the free IDs table. In PostgreSQL you can even solve that using SELECT ... FOR UPDATE SKIP LOCKED. So if you're on 9.5 you can actually make this non-awful, though it'll still be slow.

    I strongly advise you to just use a SEQUENCE directly, and not bother with re-using values.

    0 讨论(0)
提交回复
热议问题