Optimize INSERT / UPDATE / DELETE operation

后端未结

关注

 1  1167

没有蜡笔的小新 2021-01-23 09:24

I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about apply

1条回答

逝去的感伤 (楼主)

2021-01-23 09:56

Modified table definition

If you really need those columns to be NOT NULL and you really need the string 'default' as default for engine_slug, I would advice to introduce column defaults:

COLUMN           |          TYPE           |      Modifiers
-----------------+-------------------------+---------------------
 id              | INTEGER                 | NOT NULL DEFAULT ... 
 engine_slug     | CHARACTER VARYING(200)  | NOT NULL DEFAULT 'default'
 content_type_id | INTEGER                 | NOT NULL
 object_id       | text                    | NOT NULL
 object_id_int   | INTEGER                 |
 title           | CHARACTER VARYING(1000) | NOT NULL
 description     | text                    | NOT NULL DEFAULT ''
 content         | text                    | NOT NULL
 url             | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
 meta_encoded    | text                    | NOT NULL DEFAULT '{}'
 search_tsv      | tsvector                | NOT NULL
 ...

DDL statement would be:

ALTER TABLE watson_searchentry ALTER COLUMN  engine_slug DEFAULT 'default';

Etc.

Then you don't have to insert those values manually every time.

Also: object_id text NOT NULL, object_id_int INTEGER? That's odd. I guess you have your reasons ...

I'll go with your updated requirement:

The main point is to update columns title and content in watson_searchentry

Of course, you must add a UNIQUE constraint to enforce your requirements:

ALTER TABLE watson_searchentry
ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)

The accompanying index will be used. By this query for starters.

BTW, I almost never use varchar(n) in Postgres. Just text. Here's one reason.

Query with data-modifying CTEs

This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
Additionally, this query only deletes what has to be deleted, and updates what can be updated.

WITH  ctyp AS (
   SELECT id AS content_type_id
   FROM   django_content_type
   WHERE  app_label = 'web'
   AND    model = 'member'
   )
, sel AS (
   SELECT ctyp.content_type_id
         ,m.id       AS object_id_int
         ,m.id::text AS object_id       -- explicit cast!
         ,m.name     AS title
         ,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
         -- other columns have column default now.
   FROM   web_user    u
   JOIN   web_member  m  ON m.user_id = u.id
   JOIN   web_country c  ON c.id = m.country_id
   CROSS  JOIN ctyp
   WHERE  u.is_active
   )
, del AS (     -- only if you want to del all other entries of same type
   DELETE FROM watson_searchentry w
   USING  ctyp
   WHERE  w.content_type_id = ctyp.content_type_id
   AND    NOT EXISTS (
      SELECT 1
      FROM   sel
      WHERE  sel.object_id_int = w.object_id_int
      )
   )
, up AS (      -- update existing rows
   UPDATE watson_searchentry 
   SET    object_id = s.object_id
         ,title     = s.title
         ,content   = s.content
   FROM   sel s
   WHERE  w.content_type_id = s.content_type_id
   AND    w.object_id_int   = s.object_id_int
   )
               -- insert new rows
INSERT  INTO watson_searchentry (
        content_type_id, object_id_int, object_id, title, content)
SELECT  sel.*  -- safe to use, because col list is defined accordingly above
FROM    sel
LEFT    JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
WHERE   w1.content_type_id IS NULL;

The subquery on django_content_type always returns a single value? Otherwise, the CROSS JOIN might cause trouble.
The first CTE sel gathers the rows to be inserted. Note how I pick matching column names to simplify things.
In the CTE del I avoid deleting rows that can be updated.
In the CTE up those rows are updated instead.
Accordingly, I avoid inserting rows that were not deleted before in the final INSERT.

Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.

Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.

Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.

Speed first

If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:

CREATE OR REPLACE FUNCTION update_member_search_index()
  RETURNS VOID AS
$func$
DECLARE
   _ctype_id int := (
      SELECT id
      FROM   django_content_type
      WHERE  app_label='web'
      AND    model = 'member'
      );  -- you can assign at declaration time. saves another statement
BEGIN
   DELETE FROM watson_searchentry
   WHERE content_type_id = _ctype_id;

   INSERT INTO watson_searchentry
         (content_type_id, object_id, object_id_int, title, content)
   SELECT _ctype_id, m.id, m.id::int,m.name
         ,u.email || ' ' || m.normalized_name || ' ' || c.name
   FROM   web_member  m
   JOIN   web_user    u USING (user_id)
   JOIN   web_country c ON c.id = m.country_id
   WHERE  u.is_active;
END
$func$ LANGUAGE plpgsql;

I even refrain from using concat_ws(): It is safe against NULL values and simplifies code, but a bit slower than simple concatenation.

Also:

There is a trigger on the table that sets value of column search_tsv based on these columns.

It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.

0 讨论(0)