AWS Redshift Pivot Table all Dimensions

后端未结

关注

 1  748

闹比i

I am following the method to pivot a large table in redshift:

Pivot a table with Amazon RedShift / PostgreSQL

However I have a large number of groups to pivot ie

相关标签:

1条回答

我寻月下人不归

2021-01-25 17:27
If you want to be able to pivot to arbitrary numbers of groups you can combine the groups into a JSON string and then extract the groups you are interested in with the Redshift JSON functions. You probably do not want to do this for very large data sets.

Here is the basic idea based on the sample data in the question linked above:
```
select DimensionA, DimensionB,
    json_extract_path_text(json_pivot, 'm1') m1,
    json_extract_path_text(json_pivot, 'm2') m2
from (
    select DimensionA, DimensionB,
        '{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',')
               within group (order by MetricName) || '}' as json_pivot
    from to_pivot
    group by DimensionA, DimensionB
)
```
In practice you would not want to run it like that. The inner select is what you would use to generate your "pivoted" table, and the outer select shows how to reference specific group values.

This does not account for duplicate group records for the same dim combination like the following:
```
DimensionA  DimensionB  MetricName  MetricValue
----------  ----------  ----------  -----------
dimA1       dimB2       m1          v13
dimA1       dimB2       m1          v23
```
If that is a possibility in the data then you will have to figure out how to handle that. I am not sure how it would behave as implemented. My guess is the first occurrence would be extracted.

This could probably be done using a combination of LISTAGG and REGEXP_SUBSTR as well using two custom delimiters.

Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

Explained slightly differently here.
0 讨论(0)
发布评论:

提交评论
- 加载中...