Slow PostgreSQL query in production - help me understand this explain analyze output

前端 未结 2 1925
忘掉有多难
忘掉有多难 2021-02-10 04:26

I have a query that is taking 9 minutes to run on PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

This

2条回答
  •  囚心锁ツ
    2021-02-10 05:27

    I rewrote your query and assume this will be faster:

    SELECT u.id AS id14_, u.first_name AS first2_14_, u.last_name AS last3_14_, u.street_1 AS street4_14_, u.street_2 AS street5_14_, u.city AS city14_, u.us_state_id AS us7_14_, u.region AS region14_, u.country_id AS country9_14_, u.postal_code AS postal10_14_, u.user_name AS user11_14_, u.password AS password14_, u.profession AS profession14_, u.phone AS phone14_, u.url AS url14_, u.bio AS bio14_, u.last_login AS last17_14_, u.status AS status14_, u.birthdate AS birthdate14_, u.ageinyears AS ageinyears14_, u.deleted AS deleted14_, u.createdate AS createdate14_, u.audit AS audit14_, u.migrated2008 AS migrated24_14_, u.creator AS creator14_
    FROM   dir_users u 
    WHERE  u.status = 'active'
    AND    u.deleted = FALSE
    AND    EXISTS (
       SELECT 1
       FROM   dir_memberships m
       JOIN   dir_roles       r ON r.id = m.role
       JOIN   dir_groups      g ON g.id = m.group_id
       WHERE  m.group_id = 15499
       AND    m.user_id = u.id
       AND   (m.expires IS NULL
           OR m.expires > now() AND (m.startdate IS NULL OR m.startdate < now()))
       AND    m.deleted = FALSE
       AND    r.deleted = FALSE
       AND    r.name = 'ROLE_MEMBER'
       AND    g.deleted = FALSE
       )
    AND    EXISTS (
        SELECT 1
        FROM   dir_memberships m
        JOIN   dir_roles       r ON r.id = m.role
        WHERE (m.expires IS NULL
            OR m.expires > now() AND (m.startDate IS NULL OR m.startDate < now()))
        AND    m.deleted = FALSE
        AND    m.user_id = u.id
        AND    r.name = 'ROLE_TEACHER_MEMBER'
        )
    

    Rewrite with EXISTS

    • Replaced the weird case ... end = 1 expressions with simple expressions
    • Rewrote all JOINs with explicit join syntax to make it easier to read.
    • Transformed the big JOIN construct and the IN expression into two EXISTS semi-joins, which voids the necessity for DISTINCT. This should be quite a bit faster.
    • Lots of minor edits to make the query simpler, but they don't change the substance.
      Especially use simper aliases - what you had was noisy and confusing.

    Indexes

    If this isn't fast enough yet, and your write performance can deal with more indexes, add this partial multi-column index:

    CREATE INDEX dir_memberships_g_id_u_id_idx ON dir_memberships (group_id, user_id)
    WHERE  deleted = FALSE;
    

    The WHERE conditions have to match your query for the index to be useful!

    I assume that you already have primary keys and indexes on relevant foreign keys.

    Further:

    CREATE INDEX dir_memberships_u_id_role_idx ON dir_memberships (user_id, role)
    WHERE  deleted = FALSE;
    

    Why user_id a second time?. See:

    • Working of indexes in PostgreSQL
    • Is a composite index also good for queries on the first field?

    Also, since user_id is already used in another index you are not blocking HOT-updates (which can only be used with columns not involved in any indexes.

    Why role?
    I assume both columns are of type integer (4 bytes). I have seen in your detailed question, that you run a 64 bit OS where MAXALIGN 8 bytes, so another integer will not make the index grow at all. I threw in role which might be useful for the second EXISTS semi-join.

    If you have many "dead" users, this might also help:

    CREATE INDEX dir_users_id_idx ON dir_users (id)
    WHERE status = 'active' AND deleted = FALSE;
    

    As always, check with EXPLAIN to see whether the indexes actually get used. You wouldn't want useless indexes consuming resources.

    Are we fast yet?


    Of course, all the usual advice for performance optimization applies, too.

提交回复
热议问题