I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about apply
If you really need those columns to be NOT NULL
and you really need the string 'default'
as default for engine_slug
, I would advice to introduce column defaults:
COLUMN | TYPE | Modifiers
-----------------+-------------------------+---------------------
id | INTEGER | NOT NULL DEFAULT ...
engine_slug | CHARACTER VARYING(200) | NOT NULL DEFAULT 'default'
content_type_id | INTEGER | NOT NULL
object_id | text | NOT NULL
object_id_int | INTEGER |
title | CHARACTER VARYING(1000) | NOT NULL
description | text | NOT NULL DEFAULT ''
content | text | NOT NULL
url | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
meta_encoded | text | NOT NULL DEFAULT '{}'
search_tsv | tsvector | NOT NULL
...
DDL statement would be:
ALTER TABLE watson_searchentry ALTER COLUMN engine_slug DEFAULT 'default';
Etc.
Then you don't have to insert those values manually every time.
Also: object_id text NOT NULL, object_id_int INTEGER
? That's odd. I guess you have your reasons ...
I'll go with your updated requirement:
The main point is to update columns
title
andcontent
inwatson_searchentry
Of course, you must add a UNIQUE constraint to enforce your requirements:
ALTER TABLE watson_searchentry
ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)
The accompanying index will be used. By this query for starters.
BTW, I almost never use varchar(n)
in Postgres. Just text
. Here's one reason.
This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
Additionally, this query only deletes what has to be deleted, and updates what can be updated.
WITH ctyp AS (
SELECT id AS content_type_id
FROM django_content_type
WHERE app_label = 'web'
AND model = 'member'
)
, sel AS (
SELECT ctyp.content_type_id
,m.id AS object_id_int
,m.id::text AS object_id -- explicit cast!
,m.name AS title
,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
-- other columns have column default now.
FROM web_user u
JOIN web_member m ON m.user_id = u.id
JOIN web_country c ON c.id = m.country_id
CROSS JOIN ctyp
WHERE u.is_active
)
, del AS ( -- only if you want to del all other entries of same type
DELETE FROM watson_searchentry w
USING ctyp
WHERE w.content_type_id = ctyp.content_type_id
AND NOT EXISTS (
SELECT 1
FROM sel
WHERE sel.object_id_int = w.object_id_int
)
)
, up AS ( -- update existing rows
UPDATE watson_searchentry
SET object_id = s.object_id
,title = s.title
,content = s.content
FROM sel s
WHERE w.content_type_id = s.content_type_id
AND w.object_id_int = s.object_id_int
)
-- insert new rows
INSERT INTO watson_searchentry (
content_type_id, object_id_int, object_id, title, content)
SELECT sel.* -- safe to use, because col list is defined accordingly above
FROM sel
LEFT JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
WHERE w1.content_type_id IS NULL;
The subquery on django_content_type
always returns a single value? Otherwise, the CROSS JOIN
might cause trouble.
The first CTE sel
gathers the rows to be inserted. Note how I pick matching column names to simplify things.
In the CTE del
I avoid deleting rows that can be updated.
In the CTE up
those rows are updated instead.
Accordingly, I avoid inserting rows that were not deleted before in the final INSERT
.
Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.
Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.
Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.
If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:
CREATE OR REPLACE FUNCTION update_member_search_index()
RETURNS VOID AS
$func$
DECLARE
_ctype_id int := (
SELECT id
FROM django_content_type
WHERE app_label='web'
AND model = 'member'
); -- you can assign at declaration time. saves another statement
BEGIN
DELETE FROM watson_searchentry
WHERE content_type_id = _ctype_id;
INSERT INTO watson_searchentry
(content_type_id, object_id, object_id_int, title, content)
SELECT _ctype_id, m.id, m.id::int,m.name
,u.email || ' ' || m.normalized_name || ' ' || c.name
FROM web_member m
JOIN web_user u USING (user_id)
JOIN web_country c ON c.id = m.country_id
WHERE u.is_active;
END
$func$ LANGUAGE plpgsql;
I even refrain from using concat_ws(): It is safe against NULL
values and simplifies code, but a bit slower than simple concatenation.
Also:
There is a trigger on the table that sets value of column
search_tsv
based on these columns.
It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.