Postgres: Why is the performance so bad on subselects with Offset/Limit

后端未结

关注

 2  1649

梦谈多话

Can you please help me understand the reason for the performance drop between these statements?

For me it seems like in case of D & E he is first joining the add

相关标签:

2条回答

夕颜

2021-01-20 06:41

I think that the join expressed in the SELECT clause is being executed even for the 100000 rows you are not including in the final data set.

How about this:

SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (select *
      from   subscribers s
      ORDER BY s.user_id
      OFFSET 100000 LIMIT 200) s2

Failing that, try a common table expression:

With s2 as (
  select *
  from   subscribers s
  ORDER BY s.user_id
  OFFSET 100000 LIMIT 200)
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM s2

0 讨论(0)

星月不相逢

2021-01-20 06:48

This seems to perform reasonable for the ranks={1,2} case. (CTE's were terrible, FYI)

-- EXPLAIN ANALYZE
SELECT s.user_id
        , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
        , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
FROM subscribers s
JOIN (  SELECT user_id, address_id
        , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
        FROM address
        )a0 ON a0.user_id = s.user_id AND a0.rn <= 2
GROUP BY s.user_id
ORDER BY s.user_id
OFFSET 10000 LIMIT 200
        ;

UPDATE: the query below seems to perform slightly better:

    -- ----------------------------------
-- EXPLAIN ANALYZE
SELECT s.user_id
        , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
        , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
FROM ( SELECT user_id
        FROM subscribers
        ORDER BY user_id
        OFFSET 10000
        LIMIT 200
        ) s 
JOIN (     SELECT user_id, address_id
        , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
        FROM address
        ) a0 ON a0.user_id = s.user_id AND a0.rn <= 2
GROUP BY s.user_id
ORDER BY s.user_id
        ;

Note: in both the JOINS should probably LEFT JOINs, to allow for the 1st and 2nd address to be missing.

UPDATE: combining the subsetting subquery (like in @David Aldridfge 's answer) with the original (two scalar subqueries)

Self-joining the subscribers table with itself allows indexes to be used for the scalar subqueries, without the need to throw away the first 100K result-rows.

-- EXPLAIN ANALYZE
SELECT s.user_id
, (SELECT address_id
        FROM address a
        WHERE a.user_id = s.user_id
        ORDER BY address_id OFFSET 0 LIMIT 1
        ) AS a_id1
, (SELECT address_id
        FROM address a
        WHERE a.user_id = s.user_id
        ORDER BY address_id OFFSET 1 LIMIT 1
        ) AS a_id2
FROM subscribers s
JOIN (
        SELECT user_id
        FROM subscribers
        ORDER BY user_id
        OFFSET 10000 LIMIT 200
        ) x ON x.user_id = s.user_id
ORDER BY s.user_id
        ;

0 讨论(0)