REDSHIFT: How can I generate a series of numbers without creating a table called “numbers” in redshift (Postgres 8.0.2)?

问题

I need to create an empty time table series for a report so I can left join activity from several tables to it. Every hour of the day does not necessarily have data, but I want it to show null or zero for inactivity instead of omitting that hour of the day.

In later versions of Postgres (post 8.0.2), this is easy in several ways:

SELECT unnest(array[0,1,2,3,4...]) as numbers

CROSS JOIN (select generate_series as hours from generate_series(now()::timestamp, now()::timestamp + interval '1 day', '1 hour'::interval )) date_series

Redshift can run some of these commands, but throws an error when you attempt to run it in conjunction with any of the tables.

WHAT I NEED:

A reliable way to generate a series of numbers (e.g. 0-23) as a subquery that will run on redshift (uses postgres 8.0.2).

回答1:

As long as you have a table that has more rows than your required series has numbers, this is what has worked for me in the past:

select
    (row_number() over (order by 1)) - 1 as hour
from
    large_table
limit 24
;

Which returns numbers 0-23.

回答2:

I'm not a big fan of querying a system table just to get a list of row numbers. If it's something constant and small enough like hours of a day, I would go with plain old UNION ALL:

WITH 
  hours_in_day AS (
    SELECT 0 AS hour
    UNION ALL SELECT 1
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    ...
    UNION ALL SELECT 23
  )

And then joinhours_in_day to whatever you want to.

回答3:

Unfortunately, Amazon Redshift does not allow use of generate_series() for table functions. The workaround seems to be creating a table of numbers.