Optimize INSERT / UPDATE / DELETE operation

后端 未结 1 1167
没有蜡笔的小新
没有蜡笔的小新 2021-01-23 09:24

I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about apply

1条回答
  •  逝去的感伤
    2021-01-23 09:56

    Modified table definition

    If you really need those columns to be NOT NULL and you really need the string 'default' as default for engine_slug, I would advice to introduce column defaults:

    COLUMN           |          TYPE           |      Modifiers
    -----------------+-------------------------+---------------------
     id              | INTEGER                 | NOT NULL DEFAULT ... 
     engine_slug     | CHARACTER VARYING(200)  | NOT NULL DEFAULT 'default'
     content_type_id | INTEGER                 | NOT NULL
     object_id       | text                    | NOT NULL
     object_id_int   | INTEGER                 |
     title           | CHARACTER VARYING(1000) | NOT NULL
     description     | text                    | NOT NULL DEFAULT ''
     content         | text                    | NOT NULL
     url             | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
     meta_encoded    | text                    | NOT NULL DEFAULT '{}'
     search_tsv      | tsvector                | NOT NULL
     ...

    DDL statement would be:

    ALTER TABLE watson_searchentry ALTER COLUMN  engine_slug DEFAULT 'default';
    

    Etc.

    Then you don't have to insert those values manually every time.

    Also: object_id text NOT NULL, object_id_int INTEGER? That's odd. I guess you have your reasons ...

    I'll go with your updated requirement:

    The main point is to update columns title and content in watson_searchentry

    Of course, you must add a UNIQUE constraint to enforce your requirements:

    ALTER TABLE watson_searchentry
    ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)
    

    The accompanying index will be used. By this query for starters.

    BTW, I almost never use varchar(n) in Postgres. Just text. Here's one reason.

    Query with data-modifying CTEs

    This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
    Additionally, this query only deletes what has to be deleted, and updates what can be updated.

    WITH  ctyp AS (
       SELECT id AS content_type_id
       FROM   django_content_type
       WHERE  app_label = 'web'
       AND    model = 'member'
       )
    , sel AS (
       SELECT ctyp.content_type_id
             ,m.id       AS object_id_int
             ,m.id::text AS object_id       -- explicit cast!
             ,m.name     AS title
             ,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
             -- other columns have column default now.
       FROM   web_user    u
       JOIN   web_member  m  ON m.user_id = u.id
       JOIN   web_country c  ON c.id = m.country_id
       CROSS  JOIN ctyp
       WHERE  u.is_active
       )
    , del AS (     -- only if you want to del all other entries of same type
       DELETE FROM watson_searchentry w
       USING  ctyp
       WHERE  w.content_type_id = ctyp.content_type_id
       AND    NOT EXISTS (
          SELECT 1
          FROM   sel
          WHERE  sel.object_id_int = w.object_id_int
          )
       )
    , up AS (      -- update existing rows
       UPDATE watson_searchentry 
       SET    object_id = s.object_id
             ,title     = s.title
             ,content   = s.content
       FROM   sel s
       WHERE  w.content_type_id = s.content_type_id
       AND    w.object_id_int   = s.object_id_int
       )
                   -- insert new rows
    INSERT  INTO watson_searchentry (
            content_type_id, object_id_int, object_id, title, content)
    SELECT  sel.*  -- safe to use, because col list is defined accordingly above
    FROM    sel
    LEFT    JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
    WHERE   w1.content_type_id IS NULL;
    
    • The subquery on django_content_type always returns a single value? Otherwise, the CROSS JOIN might cause trouble.

    • The first CTE sel gathers the rows to be inserted. Note how I pick matching column names to simplify things.

    • In the CTE del I avoid deleting rows that can be updated.

    • In the CTE up those rows are updated instead.

    • Accordingly, I avoid inserting rows that were not deleted before in the final INSERT.

    Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.

    Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.

    Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.

    Speed first

    If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:

    CREATE OR REPLACE FUNCTION update_member_search_index()
      RETURNS VOID AS
    $func$
    DECLARE
       _ctype_id int := (
          SELECT id
          FROM   django_content_type
          WHERE  app_label='web'
          AND    model = 'member'
          );  -- you can assign at declaration time. saves another statement
    BEGIN
       DELETE FROM watson_searchentry
       WHERE content_type_id = _ctype_id;
    
       INSERT INTO watson_searchentry
             (content_type_id, object_id, object_id_int, title, content)
       SELECT _ctype_id, m.id, m.id::int,m.name
             ,u.email || ' ' || m.normalized_name || ' ' || c.name
       FROM   web_member  m
       JOIN   web_user    u USING (user_id)
       JOIN   web_country c ON c.id = m.country_id
       WHERE  u.is_active;
    END
    $func$ LANGUAGE plpgsql;
    

    I even refrain from using concat_ws(): It is safe against NULL values and simplifies code, but a bit slower than simple concatenation.

    Also:

    There is a trigger on the table that sets value of column search_tsv based on these columns.

    It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.

    0 讨论(0)
提交回复
热议问题