Continue existing table until end of month with forecasted data and update daily

问题

I'd like to create a new table in Google Big Query with existing daily revenue data and extend this new table with forecast data which is based on the existing data and needs to be created. Once new actual data exists for a certain day it overrides the forecast data for that day. Also, the forecast data until the end of the month is then updated again.

So far, I came up with the following, which generates an error message: Scalar subquery produced more than one element

    SELECT
        date, sum(yl_revenue), 'ACTUAL' as type 
        from project.dataset.table 
        where date >"2020-01-01" and date < current_date() 
        group by date 
        union distinct

        SELECT 
        (select calendar_date 
    FROM 
UNNEST(GENERATE_DATE_ARRAY('2020-01-01', DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY), INTERVAL 1 DAY)) 
AS calendar_date), 
        avg(revenue_daily) as average_daily_revenue, 
        'FORECAST' as type FROM 
            (SELECT sum(revenue) as revenue_daily from project.dataset.table 
    WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date)

How I wish the data looks like:

+------------+------------+----------+
|    date    |  revenue   |   type   |
+------------+------------+----------+
| 01.04.2020 | 100 €      | ACTUAL   |
| …          | 5.000 €    | ACTUAL   |
| 23.04.2020 | 200 €      | ACTUAL   |
| 24.04.2020 |  230,43 €  | FORECAST |
| 25.04.2020 |  230,43 €  | FORECAST |
| 26.04.2020 |  230,43 €  | FORECAST |
| 27.04.2020 |  230,43 €  | FORECAST |
| 28.04.2020 |  230,43 €  | FORECAST |
| 29.04.2020 |  230,43 €  | FORECAST |
| 30.04.2020 |  230,43 €  | FORECAST |
+------------+------------+----------+

On the next day (24.04.2020) it should look like this:

+------------+--------------+----------+
|    date    |   revenue    |   type   |
+------------+--------------+----------+
| 01.04.2020 | 100 €        | ACTUAL   |
| …          | 5.000 €      | ACTUAL   |
| 23.04.2020 | 200 €        | ACTUAL   |
| 24.04.2020 |  1.000,00 €  | ACTUAL   | <----
| 25.04.2020 |  262,50 €    | FORECAST |
| 26.04.2020 |  262,50 €    | FORECAST |
| 27.04.2020 |  262,50 €    | FORECAST |
| 28.04.2020 |  262,50 €    | FORECAST |
| 29.04.2020 |  262,50 €    | FORECAST |
| 30.04.2020 |  262,50 €    | FORECAST |
+------------+--------------+----------+

The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far. Notice, that the daily forecast value changed in the second table as a new actual value was added to it.

Any help on how to approach this is much appreciated!

Thanks

Jan

回答1:

When new day is updated - you can run below to update the rest of the days

UPDATE `project.dataset.table`
SET revenue = (
  SELECT ROUND(SUM(revenue) / COUNT(1), 2) 
  FROM `project.dataset.table`
  WHERE type = 'ACTUAL'
)
WHERE type = 'FORECAST'

Above assumes you have monthly tables with all days pre-created in it If you have different layout - above can be easily adjusted for it

回答2:

I found a solution to my problem. (Although it may not be the most sophisticated one)

I now came up with 3 new tables:

provides past & future dates, which is why I called it 'calendar'
provides revenue data for the current month. I overwrite this table every day with a scheduled query, which provides actual past data and forecasted future data (based on the actual data of the month) until the end of the current month.
provides past data (dating back longer than just the current month), plus the daily updated data from 2). I use a scheduled MERGE query for this one, too.

Here are the respective queries:

SELECT
  *
FROM
  UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2030-12-31', INTERVAL 1 DAY)) AS calendar_date
WITH
OFFSET
  AS
OFFSET
ORDER BY
OFFSET

SELECT
  date,
  'actual' AS type,
  ROUND(SUM(revenue),2)
FROM
  `project.dataset.revenue_data` 
WHERE
  EXTRACT(year
  FROM
    date) = EXTRACT (year
  FROM
    CURRENT_DATE())
  AND EXTRACT(month
  FROM
    date) = EXTRACT (month
  FROM
    CURRENT_DATE())
GROUP BY
  date
UNION DISTINCT
SELECT
  calendar_date,
  'forecast',
  (
  SELECT
    ROUND(AVG(revenue_daily),2)
  FROM (
    SELECT
      SUM(revenue) AS revenue_daily
    FROM
      `project.dataset.revenue_data`
    WHERE
      EXTRACT(year
      FROM
        date) = EXTRACT (year
      FROM
        CURRENT_DATE())
      AND EXTRACT(month
      FROM
        date) = EXTRACT (month
      FROM
        CURRENT_DATE())
    GROUP BY
      date
    ORDER BY
      date) AS average_daily_revenue),
FROM
  `project.dataset.calendar`
WHERE
  calendar_date >= CURRENT_DATE()
  AND calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)
ORDER BY
  date

MERGE
  `project.dataset.forecast_table` f
USING
  `project.dataset.forecast_month` m
ON
  f.date = m.date
  WHEN MATCHED THEN UPDATE SET f.type = m.type, f.revenue = m.revenue
  WHEN NOT MATCHED
  AND m.date >= CURRENT_DATE() THEN
INSERT
  (date,
    type,
    revenue)
VALUES
  (date, type, revenue)

来源：https://stackoverflow.com/questions/61388533/continue-existing-table-until-end-of-month-with-forecasted-data-and-update-daily

标签

sql

google-bigquery

forecasting