问题
EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;
回答1:
I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);
来源:https://stackoverflow.com/questions/59966415/snowflake-pivot-attribute-values-into-columns-in-array-of-objects