SQL Parent/Child recursive call or union?

后端 未结 5 984
有刺的猬
有刺的猬 2021-02-10 11:04

I can\'t seem to find a relevant example out there.

I\'m trying to return a sub-set of a table, and for each row in that table, I want to check how many children it has,

5条回答
  •  南笙
    南笙 (楼主)
    2021-02-10 11:21

    An explanation of why @cletus is wrong.

    First, props on doing the research.

    Second, you are doing it wrong.

    Explanation:

    Original query:

    EXPLAIN
    SELECT ID, (SELECT COUNT(1) FROM Blah WHERE ParentID= a.ID) as ChildCount
    FROM Blah a
    

    Result:

        "Seq Scan on blah a  (cost=0.00..145180063607.45 rows=2773807 width=4)"
        "  SubPlan"
        "    ->  Aggregate  (cost=52339.61..52339.63 rows=1 width=0)"
        "          ->  Seq Scan on blah  (cost=0.00..52339.59 rows=10 width=0)"
        "                Filter: (parentid = $0)"
    

    What happens when you wrap in "select count(1)" :

    EXPLAIN SELECT count(1) FROM (
    SELECT ID, (SELECT COUNT(1) FROM Blah WHERE ParentID= a.ID) as ChildCount
    FROM Blah a) as bar
    
        "Aggregate  (cost=52339.59..52339.60 rows=1 width=0)"
        "  ->  Seq Scan on blah a  (cost=0.00..45405.07 rows=2773807 width=0)"
    

    Notice the difference?

    The optimizer is smart enough to see that it doesn't need to do the subquery. So it's not that correlated subqueries are fast; it's that NOT DOING THEM is fast :-).

    Unfortunately it can't do the same for a left outer join, since the number of results is not pre-determined by the first scan.

    Lesson #1: The query plans tell you a hell of a lot. Poor experiment design gets you into trouble.

    Lesson #1.1: If you don't need to do a join, by all means, don't.

    I created a test dataset of roughly 2.7 million queries.

    The left outer join -- without the wrapper -- ran 171,757 ms on my laptop.

    The correlated subquery... I'll update when it finishes, I am at 700K ms and it's still running.

    Lesson #2: When someone tells you to look at the query plan, and claims it's showing an algorithmic order of difference... look at the query plan.

提交回复
热议问题