How to get substring from column which contains records for filter and group by clause in AWS Redshift database.
I have table with records like:
Table_Id
@JonScott, @AlexYes and other pals who struggle with similar kinda situations.
I found more better approach other than suggested by @AlexYes.
What I did, I flatter category column which result individual records. Which I can further process.
Query:
select row_number() over(order by 1) as r1,
to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
split_part(categories, ';', numbers.n) as catg,
value
from <TABLE>
join numbers
on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>
Explanation:
Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.
To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows. Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.
Your options are as follows:
Use the method suggested by AlexYes in another answer. This is semi-dynamic and is probably your best option.
Outside of Redshift, run some ETL code to perform the column -> multiple rows ETL.
Create a hardcoded type solution, and perform the pivot something like this:
select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table union all select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table union all
etc
If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:
WITH
values as (
select distinct category
from (
select distinct split_part(categories,';',1) as category from your_table
union select distinct split_part(categories,';',2) from your_table
union select distinct split_part(categories,';',3) from your_table
)
where nullif(category,'') is not null
)
SELECT
t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category
if you have more than 3 options just add another split_part
level both in WITH
part and the join condition