I need to create an empty time table series for a report so I can left join activity from several tables to it. Every hour of the day does not necessarily have data, but I want it to show null or zero for inactivity instead of omitting that hour of the day.
In later versions of Postgres (post 8.0.2), this is easy in several ways:
SELECT unnest(array[0,1,2,3,4...]) as numbers
OR
CROSS JOIN (select generate_series as hours
from generate_series(now()::timestamp,
now()::timestamp + interval '1 day',
'1 hour'::interval
)) date_series
Redshift can run some of these commands, but throws an error when you attempt to run it in conjunction with any of the tables.
WHAT I NEED:
A reliable way to generate a series of numbers (e.g. 0-23) as a subquery that will run on redshift (uses postgres 8.0.2).
As long as you have a table that has more rows than your required series has numbers, this is what has worked for me in the past:
select
(row_number() over (order by 1)) - 1 as hour
from
large_table
limit 24
;
Which returns numbers 0-23
.
I'm not a big fan of querying a system table just to get a list of row numbers. If it's something constant and small enough like hours of a day, I would go with plain old UNION ALL
:
WITH
hours_in_day AS (
SELECT 0 AS hour
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
...
UNION ALL SELECT 23
)
And then joinhours_in_day
to whatever you want to.
Unfortunately, Amazon Redshift does not allow use of generate_series()
for table functions. The workaround seems to be creating a table of numbers.
See also:
- Using sql function generate_series() in redshift
- Generate Series in Redshift and MySQL, which does not seem correct but does introduce some interesting ideas
来源:https://stackoverflow.com/questions/38667215/redshift-how-can-i-generate-a-series-of-numbers-without-creating-a-table-called