问题
I am new to Athena and I am trying to understand how to turn multiple columns from long to wide format. It seems like presto
is what is needed, but I've only successfully been able to apply map_agg
to one variable. I think my below final outcome can be achieved with multimap_agg
but cannot quite get it to work.
Below I walk through my steps and data. If you have some suggestions or questions, please let me know!
First, the data starts like this:
id | letter | number | value
------------------------------------
123 | a | 1 | 62
123 | a | 2 | 38
123 | a | 3 | 44
123 | b | 1 | 74
123 | b | 2 | 91
123 | b | 3 | 97
123 | c | 1 | 38
123 | c | 2 | 98
123 | c | 3 | 22
456 | a | 1 | 99
456 | a | 2 | 33
456 | a | 3 | 81
456 | b | 1 | 34
456 | b | 2 | 79
456 | b | 3 | 43
456 | c | 1 | 86
456 | c | 2 | 60
456 | c | 3 | 59
Then I transform the data into the below using filtering with the where
clause and then joining
:
id | letter | 1 | 2 | 3
----------------------------
123 | a | 62 | 38 | 44
123 | b | 74 | 91 | 97
123 | c | 38 | 98 | 22
456 | a | 99 | 33 | 81
456 | b | 34 | 79 | 43
456 | c | 86 | 60 | 59
For the final outcome, I would like to transform it into the below:
id | a_1 | a_2 | a_3 | b_1 | b_2 | b_3 | c_1 | c_2 | c_3
--------------------------------------------------------------------------
123 | 62 | 38 | 44 | 74 | 91 | 97 | 38 | 98 | 22
456 | 99 | 33 | 81 | 34 | 79 | 43 | 86 | 60 | 59
回答1:
You can use window functions and conditional aggregation. This requires that you know in advance the possible letters, and the maximum rows per id/letter tuple:
select
id,
max(case when letter = 'a' and rn = 1 then value end) a_1,
max(case when letter = 'a' and rn = 2 then value end) a_2,
max(case when letter = 'a' and rn = 3 then value end) a_3,
max(case when letter = 'b' and rn = 1 then value end) b_1,
max(case when letter = 'b' and rn = 2 then value end) b_2,
max(case when letter = 'b' and rn = 3 then value end) b_3,
max(case when letter = 'c' and rn = 1 then value end) c_1,
max(case when letter = 'c' and rn = 2 then value end) c_2,
max(case when letter = 'c' and rn = 3 then value end) c_3
from (
select
t.*,
row_number() over(partition by id, letter order by number) rn
from mytable t
) t
group by id
Actually, if the number
s are always 1
, 2
, 3
, then you don't even need the window function:
select
id,
max(case when letter = 'a' and number = 1 then value end) a_1,
max(case when letter = 'a' and number = 2 then value end) a_2,
max(case when letter = 'a' and number = 3 then value end) a_3,
max(case when letter = 'b' and number = 1 then value end) b_1,
max(case when letter = 'b' and number = 2 then value end) b_2,
max(case when letter = 'b' and number = 3 then value end) b_3,
max(case when letter = 'c' and number = 1 then value end) c_1,
max(case when letter = 'c' and number = 2 then value end) c_2,
max(case when letter = 'c' and number = 3 then value end) c_3
from mytable t
group by id
回答2:
Athena needs the columns to be known at query time, but the next best thing is using a map, as you hint to in your question.
One way to achieve the results you are after is this query (the_table
refers to the first table in your questions, the one with id
, letter
, number
, and value
columns):
SELECT
id,
map_agg(letter || '_' || CAST(number AS varchar), value) AS letter_number_value
FROM the_table
GROUP BY id
Which gives this result:
id | letter_number_value
----+-------------------------------------------------------------------------
123 | {a_1=62, a_2=38, a_3=44, b_1=74, b_2=91, b_3=97, c_1=38, c_2=98, c_3=22}
456 | {a_1=99, a_2=33, a_3=81, b_1=34, b_2=79, b_3=43, c_1=86, c_2=60, c_3=59}
I cheated slightly by manually sorting the map keys, if you run the query they will end up in arbitrary order, but I figured that this way it is easier to see that the result is the desired.
Please note that this assumes there are no duplicate letter/number combinations, if there are I think it's undefined which value will end up in the result.
Also note that Athena's output format for maps is ambiguous and that there are situations where you can end up with unparseable results (for example when keys or values include equal signs or commas). Therefore I would recommend casting the map as JSON and using a JSON parser in your application code, e.g. CAST(map_agg(…) AS JSON)
.
来源:https://stackoverflow.com/questions/63142257/athena-presto-multiple-columns-from-long-to-wide