My goal is to find every user who has ever been assigned to a task, and then generate some statistics over a particular date range, and associate the stats with
The query can probably be simplified to:
SELECT u.name AS user_name
, p.name AS project_name
, tl.created_on::date AS changeday
, coalesce(sum(nullif(new_value, '')::numeric), 0)
- coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
FROM users u
LEFT JOIN (
tasks t
JOIN fixins f ON f.id = t.fixin_id
JOIN projects p ON p.id = f.project_id
JOIN task_log_entries tl ON tl.task_id = t.id
AND tl.field_id = 18
AND (tl.created_on IS NULL OR
tl.created_on >= '2013-09-08' AND
tl.created_on < '2013-09-09') -- upper border!
) ON t.assignee_id = u.id
WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
This returns all users that have ever had any task.
Plus data per projects and day where data exists in the specified date range in task_log_entries
.
The aggregate function sum() ignores NULL
values. COALESCE()
per row is not required any more as soon as you recast the calculation as the difference of two sums:
,coalesce(sum(nullif(new_value, '')::numeric), 0) -
coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
However, if it is possible that all columns of a selection have NULL
or empty strings, wrap the sums into COALESCE
once.
I am using numeric
instead of float
, safer alternative to minimize rounding errors.
Your attempt to get distinct values from the join of users
and tasks
is futile, since you join to task
once more further down. Flatten the whole query to make it simpler and faster.
These positional references are just a notational convenience:
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
... doing the same as in your original query.
To get a date
from a timestamp
you can simply cast to date
:
tl.created_on::date AS changeday
But it's much better to test with original values in the WHERE
clause or JOIN
condition (if possible, and it is possible here), so Postgres can use plain indices on the column (if available):
AND (tl.created_on IS NULL OR
tl.created_on >= '2013-09-08' AND
tl.created_on < '2013-09-09') -- next day as excluded upper border
Note that a date literal is converted to a timestamp
at 00:00
of the day at your current time zone. You need to pick the next day and exclude it as upper border. Or provide a more explicit timestamp literal like '2013-09-22 0:0 +2':: timestamptz
. More on excluding upper border:
For the requirement every user who has ever been assigned to a task
add the WHERE
clause:
WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
Most importantly: A LEFT [OUTER] JOIN
preserves all rows to the left of the join. Adding a WHERE
clause on the right table can void this effect. Instead, move the filter expression to the JOIN
clause. More explanation here:
Parentheses can be used to force the order in which tables are joined. Rarely needed for simple queries, but very useful in this case. I use the feature to join task
, fixins
, projects
and task_log_entries
before left-joining all of it to users
- without subquery.
Table aliases make writing complex queries easier.