问题
I have a table with hiredate (Date) ,First Name (String) and Sur Name (string) like this:
hireDate First Name Surname
13-oct-14 Cintia Roxana Padilla Julca
28-oct-14 Conor McAteer
28-oct-14 Paolo Mesia Macher
28-oct-14 William Anthony Whelan
15-nov-14 Peter Michael Coates
13-feb-15 Natalie Conche
15-mar-15 Beatriz Vargas Huanca
01-may-15 Walter Calle Chenccnes
04-may-15 Sarah Louise Price
And I made a view of a frequency of hire_dates(DATE) and the cumulative frequency in the other column like this:
Row hireDate Count Cumulative
1 13/10/2014 1 1
2 28/10/2014 3 4
3 15/11/2014 1 5
4 13/02/2015 1 6
5 15/03/2015 1 7
6 09/04/2015 1 8
7 15/04/2015 1 9
8 01/05/2015 1 10
And the query goes like this:
WITH
Data AS (
SELECT
hireDate,
COUNT(1) AS Count
FROM
`human-resources-221122.human_resources.employees_view`
WHERE
status <> "cancelled"
GROUP BY
1 )
SELECT
hireDate,
Count,
SUM(Count) OVER (ORDER BY hireDate ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS Cumulative
FROM
Data
ORDER BY
hireDate ASC
But I need to see the numbers by month and year with zeros in those places that there are no count, something like this:
Hire_Month Hire_Year Count Cumulative
October 2014 4 4
November 2014 1 5
December 2014 0 5
January 2015 0 5
February 2015 1 6
March 2015 1 7
April 2015 2 9
May 2015 1 10
Thanks in advance.
回答1:
Note the use of GENERATE_DATE_ARRAY
and RIGHT JOIN
to get the desired results:
WITH data AS (
SELECT *
FROM UNNEST ([
STRUCT(DATE("2014-12-03") AS d, 4 AS a)
, STRUCT("2015-01-05", 7)
, STRUCT("2015-03-05", 1)
])
), all_months AS (
SELECT month
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
, (SELECT MAX(d) FROM data)
, INTERVAL 1 MONTH)
) AS month
)
SELECT month, IFNULL(SUM(a),0) a, SUM(SUM(a)) OVER(ORDER BY month) a_cum
FROM data
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month
Now, if we are just counting, you can use the fact that a LEFT/RIGHT JOIN will have nulls for empty months. This is how the query could be adapted to an arbitrary table (Wikipedia here):
WITH data AS (
SELECT *, DATE(datehour) d
FROM `fh-bigquery.wikipedia_v3.pageviews_2018`
WHERE wiki='pt'
AND (datehour BETWEEN '2018-09-30' AND '2018-09-30'
OR datehour BETWEEN '2018-12-01' AND '2018-12-02'
)
AND title LIKE 'Calif%'
), all_months AS (
SELECT month
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
, (SELECT MAX(d) FROM data)
, INTERVAL 1 MONTH)
) AS month
)
SELECT month, COUNT(d) c, SUM(COUNT(d)) OVER(ORDER BY month) a_cum
FROM data
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month
来源:https://stackoverflow.com/questions/54693016/fill-with-zeros-in-a-frequency-of-dates-group-by-month-and-year-in-big-query