Optimizing multiple joins

后端 未结 3 532
悲&欢浪女
悲&欢浪女 2021-02-04 07:14

I\'m trying to figure out a way to speed up a particularly cumbersome query which aggregates some data by date across a couple of tables. The full (ugly) query is below along w

3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-04 07:34

    I uninstalled my PostgreSQL server a couple of days ago, so you'll likely have to play around with this, but hopefully it's a good start for you.

    The keys are:

    1. You shouldn't need the subqueries - just do the direct joins and aggregate
    2. You should be able to use INNER JOINs, which are typically more performant than OUTER JOINs

    If nothing else, I think that the query below is a bit clearer.

    I used a calendar table in my query, but you can replace that with the generate_series as you were using it.

    Also, depending on indexing, it might be better to compare the body_date with >= and < rather than pulling out the date part and comparing. I don't know enough about PostgreSQL to know how it works behind the scenes, so I would try both approaches to see which the server can optimize better. In pseudo-code you would be doing: body_date >= date (time=midnight) AND body_date < date + 1 (time=midnight).

    SELECT
        CAL.calendar_date AS period,
        SUM(O.body_size) AS outbound,
        SUM(I.body_size) AS inbound
    FROM
        Calendar CAL
    INNER JOIN Body OB ON
        OB.body_time::date = CAL.calendar_date
    INNER JOIN Envelope OE ON
        OE.message_id = OB.message_id AND
        OE.envelope_command = 1
    INNER JOIN Body IB ON
        IB.body_time::date = CAL.calendar_date
    INNER JOIN Envelope IE ON
        IE.message_id = IB.message_id AND
        IE.envelope_command = 2
    GROUP BY
        CAL.calendar_date
    

提交回复
热议问题