Slow PostgreSQL query in production - help me understand this explain analyze output

前端未结

关注

 2  1925

忘掉有多难 2021-02-10 04:26

I have a query that is taking 9 minutes to run on PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

This

2条回答

囚心锁ツ (楼主)

2021-02-10 05:27

I rewrote your query and assume this will be faster:

SELECT u.id AS id14_, u.first_name AS first2_14_, u.last_name AS last3_14_, u.street_1 AS street4_14_, u.street_2 AS street5_14_, u.city AS city14_, u.us_state_id AS us7_14_, u.region AS region14_, u.country_id AS country9_14_, u.postal_code AS postal10_14_, u.user_name AS user11_14_, u.password AS password14_, u.profession AS profession14_, u.phone AS phone14_, u.url AS url14_, u.bio AS bio14_, u.last_login AS last17_14_, u.status AS status14_, u.birthdate AS birthdate14_, u.ageinyears AS ageinyears14_, u.deleted AS deleted14_, u.createdate AS createdate14_, u.audit AS audit14_, u.migrated2008 AS migrated24_14_, u.creator AS creator14_
FROM   dir_users u 
WHERE  u.status = 'active'
AND    u.deleted = FALSE
AND    EXISTS (
   SELECT 1
   FROM   dir_memberships m
   JOIN   dir_roles       r ON r.id = m.role
   JOIN   dir_groups      g ON g.id = m.group_id
   WHERE  m.group_id = 15499
   AND    m.user_id = u.id
   AND   (m.expires IS NULL
       OR m.expires > now() AND (m.startdate IS NULL OR m.startdate < now()))
   AND    m.deleted = FALSE
   AND    r.deleted = FALSE
   AND    r.name = 'ROLE_MEMBER'
   AND    g.deleted = FALSE
   )
AND    EXISTS (
    SELECT 1
    FROM   dir_memberships m
    JOIN   dir_roles       r ON r.id = m.role
    WHERE (m.expires IS NULL
        OR m.expires > now() AND (m.startDate IS NULL OR m.startDate < now()))
    AND    m.deleted = FALSE
    AND    m.user_id = u.id
    AND    r.name = 'ROLE_TEACHER_MEMBER'
    )

Rewrite with `EXISTS`

Replaced the weird case ... end = 1 expressions with simple expressions
Rewrote all JOINs with explicit join syntax to make it easier to read.
Transformed the big JOIN construct and the IN expression into two EXISTS semi-joins, which voids the necessity for DISTINCT. This should be quite a bit faster.
Lots of minor edits to make the query simpler, but they don't change the substance.
Especially use simper aliases - what you had was noisy and confusing.

Indexes

If this isn't fast enough yet, and your write performance can deal with more indexes, add this partial multi-column index:

CREATE INDEX dir_memberships_g_id_u_id_idx ON dir_memberships (group_id, user_id)
WHERE  deleted = FALSE;

The WHERE conditions have to match your query for the index to be useful!

I assume that you already have primary keys and indexes on relevant foreign keys.

Further:

CREATE INDEX dir_memberships_u_id_role_idx ON dir_memberships (user_id, role)
WHERE  deleted = FALSE;

Why user_id a second time?. See:

Working of indexes in PostgreSQL
Is a composite index also good for queries on the first field?

Also, since user_id is already used in another index you are not blocking HOT-updates (which can only be used with columns not involved in any indexes.

Why role?
I assume both columns are of type integer (4 bytes). I have seen in your detailed question, that you run a 64 bit OS where MAXALIGN 8 bytes, so another integer will not make the index grow at all. I threw in role which might be useful for the second EXISTS semi-join.

If you have many "dead" users, this might also help:

CREATE INDEX dir_users_id_idx ON dir_users (id)
WHERE status = 'active' AND deleted = FALSE;

As always, check with EXPLAIN to see whether the indexes actually get used. You wouldn't want useless indexes consuming resources.

Are we fast yet?

Of course, all the usual advice for performance optimization applies, too.

0 讨论(0)

查看其它2个回答

Slow PostgreSQL query in production - help me understand this explain analyze output

Rewrite with EXISTS

Indexes

Rewrite with `EXISTS`