How to make a SELECT query in Hibernate includes Subquery COUNT(*)

后端 未结 3 1243
说谎
说谎 2021-01-05 18:03

Let\'s say we have a Category - Items one-to-many relation. I would like to do this

SELECT c.*, 
   (SELECT COUNT(*) FROM items i W         


        
3条回答
  •  礼貌的吻别
    2021-01-05 18:37

    About the query

    Inspired by @a_horse's comment I ran a quick test with a real life table for a similar purpose.

    • 80 categories
    • 6862 items
    • optimal indexes (on items.catID. There are more, but not useful here.)

    Three candidates. Results are identical. Query plans and performance vary.

    1) Subselect per category (Seregwethrin's original)

    Total runtime: 20.351 ms

    SELECT c.*
          ,(SELECT COUNT(*) FROM items i WHERE i.catid = c.id) AS item_ct
    FROM category c
    

    2) LEFT JOIN, then GROUP

    Total runtime: 36.320 ms

    SELECT c.*
          ,count(*) AS item_ct
    FROM   category c
    LEFT   JOIN items i ON i.catid = c.id
    GROUP  BY c.catid;  -- prim. key of category
    

    3) GROUP, then LEFT JOIN

    Total runtime: 18.588 ms

    SELECT c.*
          ,item_ct
    FROM   category c
    LEFT   JOIN  (
        SELECT catid
              ,count(*) AS item_ct
        FROM   items
        GROUP  BY catid
        ) i ON i.catid = c.id
    

    So, my first suggestion wasn't any good. As expected (after some thought), version 3) performs best. It makes sense, too: If you count first, and then JOIN, fewer join operations are necessary.

    The difference in performance will become more distinct for bigger tables, especially with more categories.

    For 2) to work you need PostgreSQL 9.1 and category.id needs to be the primary key.
    For older versions you would have to list all non-aggregated columns in the GROUP BY clause.

    I switched to LEFT JOIN generally, because the original query includes categories without associated items.

    The index on items.catID is only used by 1) where multiple subqueries can profit. In the other queries sequential scans are faster: all of both tables has to be read anyway.

提交回复
热议问题