How to generate date series to occupy absent dates in google BiqQuery?

后端 未结 5 2127
自闭症患者
自闭症患者 2020-12-03 11:44

I am trying to get daily sum of sales from a google big-query table. I used following code for that.

select Day(InvoiceDate) date, Sum(InvoiceAmount) sales f         


        
相关标签:
5条回答
  • 2020-12-03 12:13

    For Standard SQL

    WITH
    
    splitted AS (
      SELECT
        *
      FROM
        UNNEST( SPLIT(RPAD('',
              1 + DATE_DIFF(CURRENT_DATE(), DATE("2015-06-01"), DAY),
              '.'),''))),
      with_row_numbers AS (
      SELECT
        ROW_NUMBER() OVER() AS pos,
        *
      FROM
        splitted),
      calendar_day AS (
      SELECT
        DATE_ADD(DATE("2015-06-01"), INTERVAL (pos - 1) DAY) AS day
      FROM
        with_row_numbers)
    SELECT
      *
    FROM
      calendar_day
    ORDER BY
      day DESC
    
    0 讨论(0)
  • 2020-12-03 12:15

    Generting a list of dates and then joining whatever table you need on top seems the easiest. I used the generate_date_array + unnest and it looks quite clean.

    To generate a list of days (one day per row):

      SELECT
      *
      FROM 
        UNNEST(GENERATE_DATE_ARRAY('2018-10-01', '2020-09-30', INTERVAL 1 DAY)) AS example
    
    0 讨论(0)
  • 2020-12-03 12:20

    You can use below to generate on fly all dates in given range (in below example it is all dates from 2015-06-01 till CURRENT_DATE() - by changing those you can control which dates range to generate)

    SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
    

    so, now - you can use it with LEFT JOIN with your table to have all dates accounted. See potential example below

    SELECT
      calendar_day,
      IFNULL(sales, 0) AS sales
    FROM (
      SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
      FROM (
           SELECT ROW_NUMBER() OVER() AS pos, *
           FROM (FLATTEN((
           SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
           FROM (SELECT NULL)),h
      )))
    ) AS all_dates
    LEFT JOIN (
      SELECT DAY(InvoiceDate) DATE, SUM(InvoiceAmount) sales 
      FROM test_gmail_com.sales 
      WHERE YEAR(InvoiceDate) = YEAR(CURRENT_DATE()) AND
      MONTH(InvoiceDate) = MONTH(CURRENT_DATE())
      GROUP BY DATE 
    )
    ON DATE = calendar_day  
    

    I wanna need to get previous months sales

    Below gives all days of previous month

    SELECT DATE(DATE_ADD(DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY"), pos - 1, "DAY")) AS calendar_day
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF(DATE_ADD(CURRENT_DATE(), - DAY(CURRENT_DATE()), "DAY"), DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY")), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
    
    0 讨论(0)
  • 2020-12-03 12:21

    For these purposes it is practical to have a 'calendar' table, a table that just lists all the days within a certain range. For your specific question, it would suffice to have a table with the numbers 1 to 31. A quick way to get this table is to make a spreadsheet with these numbers, save it as a csv file and import this file into BigQuery as a table.

    You then left outer join your result set onto this table, with ifnull(sales,0) as sales.

    If you want the number of days per month (28--31) to be right, you basically have two options. Either you create a proper calendar table that covers several years and that you join on using year, month and day. Or you use the simple table with numbers 1--31 and remove numbers based on the month and the year.

    0 讨论(0)
  • 2020-12-03 12:29

    Using the Standard SQL dialect and the generate_array function to simplify the code:

    WITH serialnum AS (
      SELECT
        sn
      FROM
        UNNEST(GENERATE_ARRAY(0, 
                              DATE_DIFF(DATE_ADD(DATE_TRUNC(CURRENT_DATE()
                                                          , MONTH)
                                              , INTERVAL 1 MONTH)
                                      , DATE_TRUNC(CURRENT_DATE(), MONTH)
                                      , DAY) - 1)
                              ) AS sn
    ), date_seq AS (
    SELECT
        DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH),
                INTERVAL(sn) DAY) AS this_day
    FROM
      serialnum
    )
    SELECT
        Day(InvoiceDate) date
        , Sum(IFNULL(InvoiceAmount, 0)) sales
    FROM
        date_seq
        LEFT JOIN
        test_gmail_com.sales
    ON
        date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
    WHERE
        year(InvoiceDate) = Year(current_date())
        and
        Month(InvoiceDate) = Month(current_date())
    GROUP BY
        date
    ORDER BY
        date
    ;
    

    UPDATE

    Or, simpler still using the generate_date_array function:

    WITH date_seq AS (
    SELECT
      GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), MONTH), 
                          DATE_ADD(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH)
                                            , INTERVAL 1 MONTH)
                                   , INTERVAL -1 DAY)
                          , INTERVAL 1 DAY)
        AS this_day
    )
    SELECT
        Day(InvoiceDate) date
        , Sum(IFNULL(InvoiceAmount, 0)) sales
    FROM
        date_seq
        LEFT JOIN
        test_gmail_com.sales
    ON
        date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
    WHERE
        year(InvoiceDate) = Year(current_date())
        and
        Month(InvoiceDate) = Month(current_date())
    GROUP BY
        date
    ORDER BY
        date
    ;
    
    0 讨论(0)
提交回复
热议问题