I can\'t seem to find a relevant example out there.
I\'m trying to return a sub-set of a table, and for each row in that table, I want to check how many children it has,
An explanation of why @cletus is wrong.
First, props on doing the research.
Second, you are doing it wrong.
Explanation:
Original query:
EXPLAIN
SELECT ID, (SELECT COUNT(1) FROM Blah WHERE ParentID= a.ID) as ChildCount
FROM Blah a
Result:
"Seq Scan on blah a (cost=0.00..145180063607.45 rows=2773807 width=4)" " SubPlan" " -> Aggregate (cost=52339.61..52339.63 rows=1 width=0)" " -> Seq Scan on blah (cost=0.00..52339.59 rows=10 width=0)" " Filter: (parentid = $0)"
What happens when you wrap in "select count(1)" :
EXPLAIN SELECT count(1) FROM (
SELECT ID, (SELECT COUNT(1) FROM Blah WHERE ParentID= a.ID) as ChildCount
FROM Blah a) as bar
"Aggregate (cost=52339.59..52339.60 rows=1 width=0)" " -> Seq Scan on blah a (cost=0.00..45405.07 rows=2773807 width=0)"
Notice the difference?
The optimizer is smart enough to see that it doesn't need to do the subquery. So it's not that correlated subqueries are fast; it's that NOT DOING THEM is fast :-).
Unfortunately it can't do the same for a left outer join, since the number of results is not pre-determined by the first scan.
Lesson #1: The query plans tell you a hell of a lot. Poor experiment design gets you into trouble.
Lesson #1.1: If you don't need to do a join, by all means, don't.
I created a test dataset of roughly 2.7 million queries.
The left outer join -- without the wrapper -- ran 171,757 ms on my laptop.
The correlated subquery... I'll update when it finishes, I am at 700K ms and it's still running.
Lesson #2: When someone tells you to look at the query plan, and claims it's showing an algorithmic order of difference... look at the query plan.