Can you please help me understand the reason for the performance drop between these statements?
For me it seems like in case of D & E he is first joining the add
I think that the join expressed in the SELECT clause is being executed even for the 100000 rows you are not including in the final data set.
How about this:
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (select *
from subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200) s2
Failing that, try a common table expression:
With s2 as (
select *
from subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200)
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM s2
This seems to perform reasonable for the ranks={1,2} case. (CTE's were terrible, FYI)
-- EXPLAIN ANALYZE
SELECT s.user_id
, MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
, MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
FROM subscribers s
JOIN ( SELECT user_id, address_id
, row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
FROM address
)a0 ON a0.user_id = s.user_id AND a0.rn <= 2
GROUP BY s.user_id
ORDER BY s.user_id
OFFSET 10000 LIMIT 200
;
UPDATE: the query below seems to perform slightly better:
-- ----------------------------------
-- EXPLAIN ANALYZE
SELECT s.user_id
, MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
, MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
FROM ( SELECT user_id
FROM subscribers
ORDER BY user_id
OFFSET 10000
LIMIT 200
) s
JOIN ( SELECT user_id, address_id
, row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
FROM address
) a0 ON a0.user_id = s.user_id AND a0.rn <= 2
GROUP BY s.user_id
ORDER BY s.user_id
;
Note: in both the JOINS should probably LEFT JOIN
s, to allow for the 1st and 2nd address to be missing.
UPDATE: combining the subsetting subquery (like in @David Aldridfge 's answer) with the original (two scalar subqueries)
Self-joining the subscribers table with itself allows indexes to be used for the scalar subqueries, without the need to throw away the first 100K result-rows.
-- EXPLAIN ANALYZE
SELECT s.user_id
, (SELECT address_id
FROM address a
WHERE a.user_id = s.user_id
ORDER BY address_id OFFSET 0 LIMIT 1
) AS a_id1
, (SELECT address_id
FROM address a
WHERE a.user_id = s.user_id
ORDER BY address_id OFFSET 1 LIMIT 1
) AS a_id2
FROM subscribers s
JOIN (
SELECT user_id
FROM subscribers
ORDER BY user_id
OFFSET 10000 LIMIT 200
) x ON x.user_id = s.user_id
ORDER BY s.user_id
;