问题
When I run the SQL Query:
select generate_series(0,g)
from ( select date(date1) - date(date2) as g from mytable ;
It returns an error:
INFO: Function "generate_series(integer,integer)" not supported.
ERROR: Specified types or functions (one per INFO message) not supported
on Redshift tables.
But when I run this query:
select generate_series(0, g) from (select 5 as g)
It returns the below response:
generate_series
-----------------
0
1
2
3
4
5
(6 rows)
Why does the second query work, while the first fails?
回答1:
The generate_series()
function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide:
In the specific examples, the second query is executed entirely on the leader node as it does not need to scan any actual table data, while the first is trying to select data and as such would be executed on the compute node(s).
UPDATE:
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate date for last 30 days
回答2:
You can use a window function to achieve a similar result. This requires an existing table (like stv_blocklist
) to seed off that has at least the number of rows you need but not too many which might slow things down.
with days as (
select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day
from [other_existing_table] limit 30
)
select day from days order by 1 asc
You can use this method to get other time ranges as well for bucketing purposes. This version generates all the minutes for the previous day so you could do a left join against it and bucket your data.
with buckets AS (
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from [other_table] limit 1440
)
select minute from buckets order by 1 asc
I may have first seen this here.
回答3:
You are correct that this does not work on Redshift. See here.
You could use something like this
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
,generted_numbers AS
(
SELECT (1000*t1.num) + (100*t2.num) + (10*t3.num) + t4.num-5000 as gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
select gen_num from generted_numbers
where gen_num between -10 and 0
order by 1;
回答4:
You are not using PostgreSQL. You are using Amazon Redshift.
Amazon Redshift does not support generate_series
when used with Redshift tables. It says it right there in the error message.
Either use real PostgreSQL, or if you need Redshift's features, you must also work within the limitations of Redshift.
Your second example works because it does not use any Redshift tables.
回答5:
This works here (pg-9.3.3) Maybe your issue is just the result of a Redshift-"feature"?
CREATE TABLE mytable
( date1 timestamp
, date2 timestamp
);
INSERT INTO mytable(date1,date2) VALUES
( '2014-03-30 12:00:00' , '2014-04-01 12:00:00' );
SELECT generate_series(0, ss.g) FROM
( SELECT date(date2) - date(date1) AS g
FROM mytable
) ss ;
回答6:
Why it's not working was explained above. Still, the question "what can we do about this?" is open.
If you develop a BI system on any platform (with generators supported or not), it is very handy to have dimension tables with sequences of numbers and dates. How can you create one in Redshift?
- in Postgres, produce the necessary sequence using generator
- export to CSV
- create a table with the same schema in Redshift
- import the CSV from Step 2 to Redshift
Imagine you have created a very simple table called calendar
:
id, date
1, 2017-01-01
2, 2017-01-02
..., ...
xxx, 2020-01-01
So your query will look like this:
SELECT t.id, t.date_1, t.date_2, c.id as date_id, c.date
FROM mytable t
JOIN calendar c
ON c.date BETWEEN t.date_1::date AND t.date_2::date
ORDER BY 1,4
In calendar table you can also have first dates of week, month, quarter, weekdays (Mon,Tue,etc.), which makes such table super effective for time-based aggregations.
来源:https://stackoverflow.com/questions/22759980/generate-series-method-fails-in-redshift