问题
I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example:
| session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
| 1 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z | 30900 | 11111111 |
| 2 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z | 30000 | 22222222 |
| 3 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z | 30000 | 22222222 |
| 4 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z | 30000 | 22222222 |
| 5 | server01 | Windows | Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z | 30000 | 33333333 |
| 6 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z | 30000 | 44444444 |
| 7 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z | 30900 | 55555555 |
| 8 | server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z | 30000 | 66666666 |
| 9 | server04 | Windows | Windows File System | Validate | 2017-12-05T03:00:00Z | 30000 | 66666666 |
Each client_name (server) can have multiple datasets, and each dataset can have multiple plugin_names. So I have a created a SQL statement that does a GROUP BY of these three columns to get a list of "job" activity over time. (http://sqlfiddle.com/#!15/f15556/1)
select
client_name,
dataset,
plugin_name
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
Each of these Jobs can be successful or fail based on a status_code column. Using self-join with subqueries I'm able to get results of the Last Good backup along with it's completed_ts (completed time) and bytes_modified and more: (http://sqlfiddle.com/#!15/f15556/16)
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
I can do the same thing separately to get the Last Attempt details by removing WHERE's status_code line: http://sqlfiddle.com/#!15/f15556/3. Note that most times LastGood and LastAttempted are the same row but sometimes they are not, depending if the last backup was successful.
What I'm having problems with is merging these two statements together (if possible). So I will get this result:
| client_name | dataset | plugin_name | lastgood | lastgood_bytes | lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
| server01 | Windows | Windows File System | 2017-12-04T01:00:00Z | 22222222 | 2017-12-05T01:00:00Z | 11111111 |
| server01 | Windows | Windows VSS | 2017-12-01T01:00:00Z | 33333333 | 2017-12-01T01:00:00Z | 33333333 |
| server02 | Windows | Windows File System | 2017-12-05T02:00:00Z | 44444444 | 2017-12-05T02:00:00Z | 44444444 |
| server03 | Windows | Windows File System | 2017-12-05T03:00:00Z | 66666666 | 2017-12-05T03:00:00Z | 66666666 |
I attempted just adding another RIGHT JOIN to the end (http://sqlfiddle.com/#!15/f15556/4) and getting NULL rows. After doing some reading I see that the first two JOINs run first creating a temporary table before the 2nd join occurs, but at that point the data I need is lost so I get NULL rows.
Using PostgreSQL 8 via groovy scripting. I also only have read-only access to the DB.
回答1:
You apparently have two intermediate inner join
output tables and you want to get columns from each about some things identified by a common key. So inner join
them on the key.
select
g.client_name,
g.dataset,
g.plugin_name,
LastGood,
g.status_code,
LastGood_bytes
LastAttempt,
l.status_code,
LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
) as g
join
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
select
a1.client_name,
a1.dataset,
a1.plugin_name,
a1.LastAttempt,
a3.status_code,
a3.bytes_modified as LastAttempt_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2 a2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as a1
on a3.client_name = a1.client_name and
a3.dataset = a1.dataset and
a3.plugin_name = a1.plugin_name and
a3.completed_ts = a1.LastAttempt
) as l
on l.client_name = g.client_name and
l.dataset = g.dataset and
l.plugin_name = g.plugin_name
order by client_name, dataset, plugin_name
This uses one of the applicable approaches in Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs. However the correspondence of chunks of code might not be so clear. Its intermediate are left
vs your inner
& group_concat
is your max
. (But it has more approaches because of particulars of group_concat
& its query.)
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.
Whether this actually serves your purpose in general depends on your actual specification and constraints. Even if the two join
s you link are what you want you need to explain exactly what you mean by "merge". You don't say what you want if the join
s have different sets of values for the grouped columns. Force yourself to use the English language to say what rows go in the result based on what rows are in the input.
PS 1 You have undocumented/undeclared/unenforced constraints. Please declare when possible. Otherwise enforce by triggers. Document in question text if not in code. Constraints are fundamental to multiple subrow value instances in join
& to group by
.
PS 2 Learn the syntax/semantics for select
. Learn what left
/right
outer join on
s return--whatinner join on
does plus unmatched left/right table rows extended by null
s.
PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?
回答2:
Here is an alternate way that also works but harder to follow and likely more particular to my dataset: http://sqlfiddle.com/#!15/f15556/114
select
Actvty.client_name,
Actvty.dataset,
Actvty.plugin_name,
ActvtyGood.LastGood,
ActvtyGood.status_code as LastGood_status,
ActvtyGood.bytes_modified as LastGood_bytes,
ActvtyOnly.LastAttempt,
Actvty.status_code as LastAttempt_status,
Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty
-- 1. Get last attempt of each job (which may or may not match last good)
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name = ActvtyOnly.client_name and
Actvty.dataset = ActvtyOnly.dataset and
Actvty.plugin_name = ActvtyOnly.plugin_name and
Actvty.completed_ts = ActvtyOnly.LastAttempt
-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (
-- 3. join last good runs with the full table to get the additional details of each
select
ActvtyGoodSub.client_name,
ActvtyGoodSub.dataset,
ActvtyGoodSub.plugin_name,
ActvtyGoodSub.LastGood,
ActvtyAll.status_code,
ActvtyAll.bytes_modified
from v_activities_2 ActvtyAll
-- 2. Get last Good run of each job
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as ActvtyGoodSub
on ActvtyAll.client_name = ActvtyGoodSub.client_name and
ActvtyAll.dataset = ActvtyGoodSub.dataset and
ActvtyAll.plugin_name = ActvtyGoodSub.plugin_name and
ActvtyAll.completed_ts = ActvtyGoodSub.LastGood
) as ActvtyGood
on Actvty.client_name = ActvtyGood.client_name and
Actvty.dataset = ActvtyGood.dataset and
Actvty.plugin_name = ActvtyGood.plugin_name
来源:https://stackoverflow.com/questions/47758492/multiple-self-join-based-on-group-by-results