问题
I'd like to create a new table in Google Big Query with existing daily revenue data and extend this new table with forecast data which is based on the existing data and needs to be created. Once new actual data exists for a certain day it overrides the forecast data for that day. Also, the forecast data until the end of the month is then updated again.
So far, I came up with the following, which generates an error message: Scalar subquery produced more than one element
SELECT
date, sum(yl_revenue), 'ACTUAL' as type
from project.dataset.table
where date >"2020-01-01" and date < current_date()
group by date
union distinct
SELECT
(select calendar_date
FROM
UNNEST(GENERATE_DATE_ARRAY('2020-01-01', DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY), INTERVAL 1 DAY))
AS calendar_date),
avg(revenue_daily) as average_daily_revenue,
'FORECAST' as type FROM
(SELECT sum(revenue) as revenue_daily from project.dataset.table
WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date)
How I wish the data looks like:
+------------+------------+----------+
| date | revenue | type |
+------------+------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 230,43 € | FORECAST |
| 25.04.2020 | 230,43 € | FORECAST |
| 26.04.2020 | 230,43 € | FORECAST |
| 27.04.2020 | 230,43 € | FORECAST |
| 28.04.2020 | 230,43 € | FORECAST |
| 29.04.2020 | 230,43 € | FORECAST |
| 30.04.2020 | 230,43 € | FORECAST |
+------------+------------+----------+
On the next day (24.04.2020) it should look like this:
+------------+--------------+----------+
| date | revenue | type |
+------------+--------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 1.000,00 € | ACTUAL | <----
| 25.04.2020 | 262,50 € | FORECAST |
| 26.04.2020 | 262,50 € | FORECAST |
| 27.04.2020 | 262,50 € | FORECAST |
| 28.04.2020 | 262,50 € | FORECAST |
| 29.04.2020 | 262,50 € | FORECAST |
| 30.04.2020 | 262,50 € | FORECAST |
+------------+--------------+----------+
The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far. Notice, that the daily forecast value changed in the second table as a new actual value was added to it.
Any help on how to approach this is much appreciated!
Thanks
Jan
回答1:
When new day is updated - you can run below to update the rest of the days
UPDATE `project.dataset.table`
SET revenue = (
SELECT ROUND(SUM(revenue) / COUNT(1), 2)
FROM `project.dataset.table`
WHERE type = 'ACTUAL'
)
WHERE type = 'FORECAST'
Above assumes you have monthly tables with all days pre-created in it If you have different layout - above can be easily adjusted for it
回答2:
I found a solution to my problem. (Although it may not be the most sophisticated one)
I now came up with 3 new tables:
- provides past & future dates, which is why I called it 'calendar'
- provides revenue data for the current month. I overwrite this table every day with a scheduled query, which provides actual past data and forecasted future data (based on the actual data of the month) until the end of the current month.
- provides past data (dating back longer than just the current month), plus the daily
updated data from 2). I use a scheduled
MERGE
query for this one, too.
Here are the respective queries:
1)
SELECT
*
FROM
UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2030-12-31', INTERVAL 1 DAY)) AS calendar_date
WITH
OFFSET
AS
OFFSET
ORDER BY
OFFSET
2)
SELECT
date,
'actual' AS type,
ROUND(SUM(revenue),2)
FROM
`project.dataset.revenue_data`
WHERE
EXTRACT(year
FROM
date) = EXTRACT (year
FROM
CURRENT_DATE())
AND EXTRACT(month
FROM
date) = EXTRACT (month
FROM
CURRENT_DATE())
GROUP BY
date
UNION DISTINCT
SELECT
calendar_date,
'forecast',
(
SELECT
ROUND(AVG(revenue_daily),2)
FROM (
SELECT
SUM(revenue) AS revenue_daily
FROM
`project.dataset.revenue_data`
WHERE
EXTRACT(year
FROM
date) = EXTRACT (year
FROM
CURRENT_DATE())
AND EXTRACT(month
FROM
date) = EXTRACT (month
FROM
CURRENT_DATE())
GROUP BY
date
ORDER BY
date) AS average_daily_revenue),
FROM
`project.dataset.calendar`
WHERE
calendar_date >= CURRENT_DATE()
AND calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)
ORDER BY
date
3)
MERGE
`project.dataset.forecast_table` f
USING
`project.dataset.forecast_month` m
ON
f.date = m.date
WHEN MATCHED THEN UPDATE SET f.type = m.type, f.revenue = m.revenue
WHEN NOT MATCHED
AND m.date >= CURRENT_DATE() THEN
INSERT
(date,
type,
revenue)
VALUES
(date, type, revenue)
来源:https://stackoverflow.com/questions/61388533/continue-existing-table-until-end-of-month-with-forecasted-data-and-update-daily