问题
I have a table in big-query (datawarehouse):
and I would like to have the result of:
Here is the explanation on how the calculation should be:
- 2017-10-01 = $100 is obvious, because the data is only one
- 2017-10-02 = $400 is a sum of the first row and third row. Why? Because second row and third row have the same invoice. So we only use the latest update.
- 2017-10-04 = $800 is a sum of row 1,3, and 4. Why? It is because we only take one invoice only per day. row 1 (T001), row 3(T002), row 4(T003)
- 2017-10-05 = $100 is a sum of row 1,5, and 6. Why? It is because we only take one invoice only per day. row 1 (T001), row 5(T002), row 6(T003)
I honestly have completely lost how to do that. I have tried multiple times to group by and etc. But none of them work as expected. This is my latest effort so far for today:
SELECT
amount,
updatedDateOnly,
invNo
FROM
(
SELECT
invNo,
UpdatedDate,
amount,
DATE(updatedDate) as updatedDateOnly,
row_number() OVER (PARTITION BY invNo ORDER BY UpdatedDate DESC) AS rownum
FROM [project:dataset.test]
)
WHERE
rownum = 1
only returns the last date. Now, I have no idea how to query for daily basis.
Appreciate for anyone who is expert and willing to help in querying. Thank you.
UPDATE: Data in json, in case you want to try in bigquery or other SQL servers:
{"UpdatedDate":"2017-10-01 01:00:00","InvNo":"T001","amount":100}
{"UpdatedDate":"2017-10-02 01:00:00","InvNo":"T002","amount":200}
{"UpdatedDate":"2017-10-02 02:00:00","InvNo":"T002","amount":300}
{"UpdatedDate":"2017-10-04 01:00:00","InvNo":"T003","amount":400}
{"UpdatedDate":"2017-10-05 01:00:00","InvNo":"T002","amount":500}
{"UpdatedDate":"2017-10-05 02:00:00","InvNo":"T003","amount":500}
回答1:
Below is for BigQuery Standard SQL
#standardSQL
WITH dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
-- ORDER BY UpdatedDay
You can test / play with this with below dummy data from your question
#standardSQL
WITH `project.dataset.test` AS (
SELECT TIMESTAMP '2017-10-01 01:00:00' UpdatedDate, 'T001' InvNo, 100 amount UNION ALL
SELECT TIMESTAMP '2017-10-02 01:00:00', 'T002', 200 UNION ALL
SELECT TIMESTAMP '2017-10-02 02:00:00', 'T002', 300 UNION ALL
SELECT TIMESTAMP '2017-10-04 01:00:00', 'T003', 400 UNION ALL
SELECT TIMESTAMP '2017-10-05 01:00:00', 'T002', 500 UNION ALL
SELECT TIMESTAMP '2017-10-05 02:00:00', 'T003', 500
),
dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
ORDER BY UpdatedDay
Result is as expected
UpdatedDay amount
2017-10-01 100
2017-10-02 400
2017-10-04 800
2017-10-05 1100
回答2:
On each date, you need the most recent amount for each invoice. That is rather complicated. One solution is to start with a cross join of the dates and your records. Then window functions can be used to get the most recent amounts:
select dte,
sum(case when seqnum = 1 then amount else 0 end) as amount
from (select d.dte, t.*,
row_number() over (partition by t.invNo order by t.UpdatedDate desc) as seqnum
from (select distinct date(UpdatedDate) as dte
from `project.dataset.test` t
) d join
`project.dataset.test` t
on date(t.UpdatedDate) <= d.dte
) td
group by dte;
来源:https://stackoverflow.com/questions/46781209/bigquery-aggregate-for-daily-basis