问题
Trying to calculate daily acceptance ratios from the 'connecting' table which has 4 fields with sample values:
date action sender_id recipient_id
'2017-01-05', 'request_link', 'frank', 'joe'
'2017-01-06', 'request_link', 'sally', 'ann'
'2017-01-07', 'request_link', 'bill', 'ted'
'2017-01-07', 'accept_link', 'joe', 'frank'
'2017-01-06', 'accept_link', 'ann', 'sally'
'2017-01-06', 'accept_link', 'ted', 'bill'
Because there are 0 accepts and 1 request on 01-05, its daily acceptance ratio should be 0/1 = 0. Similarly, the ratio for 01-06 should be 2/1, and it should be 1/1 for 01-07.
It is important however that each accept_link has a corresponding request_link where the sender_id of the request_link = the recipient_id of the accept_link (and vice versa). So here a self-join is required I believe to ensure that Joe accepts Frank's request, regardless of the date.
How can the below query be corrected so that the aggregation works correctly while retaining the required join conditions? Will the query calculate correctly as is if the two WHERE conditions are removed, or are they necessary?
SELECT f1.date,
SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC
Expected output should look something like:
date acceptance_ratio
'2017-01-05' 0.0000
'2017-01-06' 2.0000
'2017-01-07' 1.0000
Thanks in advance.
回答1:
Once again, I don't think you need to be using a self join here. Instead, just use conditional aggregation over the entire table, and count the number of requests and accepts which happened on each day:
SELECT t.date,
CASE WHEN t.num_requests = 0
THEN 'No requests available'
ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
END AS acceptance_ratio
FROM
(
SELECT c1.date,
SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
THEN 1 ELSE 0 END) AS num_accepts,
SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
FROM connecting c1
LEFT JOIN connecting c2
ON c1.action = 'accept_link' AND
c2.action = 'request_link' AND
c1.sender_id = c2.recipient_id AND
c2.recipient_id = c1.sender_id
GROUP BY c1.date
) t
ORDER BY t.date
Note here that I use a CASE
expression to handle divide by zero, which could occur should a certain day no requests. I also assume here that the same invitation will not be sent out more than once.
来源:https://stackoverflow.com/questions/42130476/calculating-acceptance-ratio-with-left-join-and-self-join-and-aggregate-function