I am following the method to pivot a large table in redshift:
Pivot a table with Amazon RedShift / PostgreSQL
However I have a large number of groups to pivot ie
If you want to be able to pivot to arbitrary numbers of groups you can combine the groups into a JSON string and then extract the groups you are interested in with the Redshift JSON functions. You probably do not want to do this for very large data sets.
Here is the basic idea based on the sample data in the question linked above:
select DimensionA, DimensionB,
json_extract_path_text(json_pivot, 'm1') m1,
json_extract_path_text(json_pivot, 'm2') m2
from (
select DimensionA, DimensionB,
'{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',')
within group (order by MetricName) || '}' as json_pivot
from to_pivot
group by DimensionA, DimensionB
)
In practice you would not want to run it like that. The inner select is what you would use to generate your "pivoted" table, and the outer select shows how to reference specific group values.
This does not account for duplicate group records for the same dim combination like the following:
DimensionA DimensionB MetricName MetricValue
---------- ---------- ---------- -----------
dimA1 dimB2 m1 v13
dimA1 dimB2 m1 v23
If that is a possibility in the data then you will have to figure out how to handle that. I am not sure how it would behave as implemented. My guess is the first occurrence would be extracted.
This could probably be done using a combination of LISTAGG and REGEXP_SUBSTR as well using two custom delimiters.
Using varchar(max)
for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.
Explained slightly differently here.