Left outer join acting like inner join

后端 未结 2 1572
无人共我
无人共我 2021-01-23 05:20

Summary

My goal is to find every user who has ever been assigned to a task, and then generate some statistics over a particular date range, and associate the stats with

2条回答
  •  终归单人心
    2021-01-23 05:49

    The query can probably be simplified to:

    SELECT u.name AS user_name
         , p.name AS project_name
         , tl.created_on::date AS changeday
         , coalesce(sum(nullif(new_value, '')::numeric), 0)
         - coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
    FROM   users             u
    LEFT   JOIN (
            tasks            t 
       JOIN fixins           f  ON  f.id = t.fixin_id
       JOIN projects         p  ON  p.id = f.project_id
       JOIN task_log_entries tl ON  tl.task_id = t.id
                               AND  tl.field_id = 18
                               AND (tl.created_on IS NULL OR
                                    tl.created_on >= '2013-09-08' AND
                                    tl.created_on <  '2013-09-09') -- upper border!
           ) ON t.assignee_id = u.id
    WHERE  EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
    GROUP  BY 1, 2, 3
    ORDER  BY 1, 2, 3;
    

    This returns all users that have ever had any task.
    Plus data per projects and day where data exists in the specified date range in task_log_entries.

    Major points

    • The aggregate function sum() ignores NULL values. COALESCE() per row is not required any more as soon as you recast the calculation as the difference of two sums:

       ,coalesce(sum(nullif(new_value, '')::numeric), 0) -
        coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
      

      However, if it is possible that all columns of a selection have NULL or empty strings, wrap the sums into COALESCE once.
      I am using numeric instead of float, safer alternative to minimize rounding errors.

    • Your attempt to get distinct values from the join of users and tasks is futile, since you join to task once more further down. Flatten the whole query to make it simpler and faster.

    • These positional references are just a notational convenience:

      GROUP BY 1, 2, 3
      ORDER BY 1, 2, 3
      

      ... doing the same as in your original query.

    • To get a date from a timestamp you can simply cast to date:

      tl.created_on::date AS changeday
      

      But it's much better to test with original values in the WHERE clause or JOIN condition (if possible, and it is possible here), so Postgres can use plain indices on the column (if available):

       AND (tl.created_on IS NULL OR
            tl.created_on >= '2013-09-08' AND
            tl.created_on <  '2013-09-09')  -- next day as excluded upper border
      

      Note that a date literal is converted to a timestamp at 00:00 of the day at your current time zone. You need to pick the next day and exclude it as upper border. Or provide a more explicit timestamp literal like '2013-09-22 0:0 +2':: timestamptz. More on excluding upper border:

      • Calculate number of concurrent events in SQL
      • Find overlapping date ranges in PostgreSQL
    • For the requirement every user who has ever been assigned to a task add the WHERE clause:

      WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
      
    • Most importantly: A LEFT [OUTER] JOIN preserves all rows to the left of the join. Adding a WHERE clause on the right table can void this effect. Instead, move the filter expression to the JOIN clause. More explanation here:

      • Query with LEFT JOIN not returning rows for count of 0
    • Parentheses can be used to force the order in which tables are joined. Rarely needed for simple queries, but very useful in this case. I use the feature to join task, fixins, projects and task_log_entries before left-joining all of it to users - without subquery.

    • Table aliases make writing complex queries easier.

提交回复
热议问题