Postgres: Why is the performance so bad on subselects with Offset/Limit

后端 未结 2 1649
梦谈多话
梦谈多话 2021-01-20 06:38

Can you please help me understand the reason for the performance drop between these statements?

For me it seems like in case of D & E he is first joining the add

相关标签:
2条回答
  • 2021-01-20 06:41

    I think that the join expressed in the SELECT clause is being executed even for the 100000 rows you are not including in the final data set.

    How about this:

    SELECT s2.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM (select *
          from   subscribers s
          ORDER BY s.user_id
          OFFSET 100000 LIMIT 200) s2
    

    Failing that, try a common table expression:

    With s2 as (
      select *
      from   subscribers s
      ORDER BY s.user_id
      OFFSET 100000 LIMIT 200)
    SELECT s2.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM s2
    
    0 讨论(0)
  • 2021-01-20 06:48

    This seems to perform reasonable for the ranks={1,2} case. (CTE's were terrible, FYI)

    -- EXPLAIN ANALYZE
    SELECT s.user_id
            , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
            , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
    FROM subscribers s
    JOIN (  SELECT user_id, address_id
            , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
            FROM address
            )a0 ON a0.user_id = s.user_id AND a0.rn <= 2
    GROUP BY s.user_id
    ORDER BY s.user_id
    OFFSET 10000 LIMIT 200
            ;
    

    UPDATE: the query below seems to perform slightly better:

        -- ----------------------------------
    -- EXPLAIN ANALYZE
    SELECT s.user_id
            , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
            , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
    FROM ( SELECT user_id
            FROM subscribers
            ORDER BY user_id
            OFFSET 10000
            LIMIT 200
            ) s 
    JOIN (     SELECT user_id, address_id
            , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
            FROM address
            ) a0 ON a0.user_id = s.user_id AND a0.rn <= 2
    GROUP BY s.user_id
    ORDER BY s.user_id
            ;
    

    Note: in both the JOINS should probably LEFT JOINs, to allow for the 1st and 2nd address to be missing.


    UPDATE: combining the subsetting subquery (like in @David Aldridfge 's answer) with the original (two scalar subqueries)

    Self-joining the subscribers table with itself allows indexes to be used for the scalar subqueries, without the need to throw away the first 100K result-rows.

    -- EXPLAIN ANALYZE
    SELECT s.user_id
    , (SELECT address_id
            FROM address a
            WHERE a.user_id = s.user_id
            ORDER BY address_id OFFSET 0 LIMIT 1
            ) AS a_id1
    , (SELECT address_id
            FROM address a
            WHERE a.user_id = s.user_id
            ORDER BY address_id OFFSET 1 LIMIT 1
            ) AS a_id2
    FROM subscribers s
    JOIN (
            SELECT user_id
            FROM subscribers
            ORDER BY user_id
            OFFSET 10000 LIMIT 200
            ) x ON x.user_id = s.user_id
    ORDER BY s.user_id
            ;
    
    0 讨论(0)
提交回复
热议问题