generate_series() method fails in Redshift

问题

When I run the SQL Query:

 select generate_series(0,g)
 from ( select date(date1) - date(date2) as g from mytable ;

It returns an error:

 INFO:  Function "generate_series(integer,integer)" not supported.
 ERROR:  Specified types or functions (one per INFO message) not supported 
 on Redshift tables.

But when I run this query:

select  generate_series(0, g) from (select 5 as g)

It returns the below response:

 generate_series
-----------------
 0
 1
 2
 3
 4
 5
(6 rows)

Why does the second query work, while the first fails?

回答1:

The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide:

In the specific examples, the second query is executed entirely on the leader node as it does not need to scan any actual table data, while the first is trying to select data and as such would be executed on the compute node(s).

UPDATE:

generate_series is working with Redshift now.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

This will generate date for last 30 days

回答2:

You can use a window function to achieve a similar result. This requires an existing table (like stv_blocklist) to seed off that has at least the number of rows you need but not too many which might slow things down.

with days as (
    select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day 
    from [other_existing_table] limit 30
)
select day from days order by 1 asc

You can use this method to get other time ranges as well for bucketing purposes. This version generates all the minutes for the previous day so you could do a left join against it and bucket your data.

with buckets AS (
    select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute 
    from [other_table] limit 1440
)
select minute from buckets order by 1 asc

I may have first seen this here.

回答3:

You are correct that this does not work on Redshift. See here.

You could use something like this

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
,generted_numbers AS
(
    SELECT (1000*t1.num) + (100*t2.num) + (10*t3.num) + t4.num-5000 as gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
select  gen_num from generted_numbers
where gen_num between -10 and 0
order by 1;

回答4:

You are not using PostgreSQL. You are using Amazon Redshift.

Amazon Redshift does not support generate_series when used with Redshift tables. It says it right there in the error message.

Either use real PostgreSQL, or if you need Redshift's features, you must also work within the limitations of Redshift.

Your second example works because it does not use any Redshift tables.

回答5:

This works here (pg-9.3.3) Maybe your issue is just the result of a Redshift-"feature"?

CREATE TABLE mytable
        ( date1 timestamp
        , date2 timestamp
        );
INSERT INTO mytable(date1,date2) VALUES
( '2014-03-30 12:00:00' , '2014-04-01 12:00:00' );

SELECT  generate_series(0, ss.g) FROM
   ( SELECT date(date2) - date(date1) AS g
     FROM mytable
   ) ss ;

回答6:

Why it's not working was explained above. Still, the question "what can we do about this?" is open.

If you develop a BI system on any platform (with generators supported or not), it is very handy to have dimension tables with sequences of numbers and dates. How can you create one in Redshift?

in Postgres, produce the necessary sequence using generator
export to CSV
create a table with the same schema in Redshift
import the CSV from Step 2 to Redshift

Imagine you have created a very simple table called calendar:

 id, date
 1, 2017-01-01
 2, 2017-01-02
 ..., ...
 xxx, 2020-01-01

So your query will look like this:

SELECT t.id, t.date_1, t.date_2, c.id as date_id, c.date
FROM mytable t
JOIN calendar c
ON c.date BETWEEN t.date_1::date AND t.date_2::date
ORDER BY 1,4

In calendar table you can also have first dates of week, month, quarter, weekdays (Mon,Tue,etc.), which makes such table super effective for time-based aggregations.

来源：https://stackoverflow.com/questions/22759980/generate-series-method-fails-in-redshift

标签

amazon-redshift

generate-series