presto

AWS Athena (Presto) how to transpose map to columns

青春壹個敷衍的年華 提交于 2021-02-05 10:51:23
问题 AWS Athena query question; I have a nested map in my rows, of which I would like to transpose the keys to columns. I could name the columns explicitly like items['label_a'] , but in this case the keys are actually dynamic... From these rows: {id=1, items={label_a=foo, label_b=foo}} {id=2, items={label_a=bar, label_c=bar}} {id=3, items={label_b=baz, label_c=baz}} I would like to get a table like so: | id | label_a | label_b | label_c | ------------------------------------ | 1 | foo | foo | | |

athena presto - multiple columns from long to wide

柔情痞子 提交于 2021-02-04 08:37:25
问题 I am new to Athena and I am trying to understand how to turn multiple columns from long to wide format. It seems like presto is what is needed, but I've only successfully been able to apply map_agg to one variable. I think my below final outcome can be achieved with multimap_agg but cannot quite get it to work. Below I walk through my steps and data. If you have some suggestions or questions, please let me know! First, the data starts like this: id | letter | number | value ------------------

Presto系列 | Presto基本介绍

99封情书 提交于 2021-02-03 07:04:28
前言 Presto是一款Facebook开源的MPP架构的OLAP查询引擎,可针对不同数据源执行大容量数据集的一款分布式SQL执行引擎。因为工作中接触到Presto,研究它对理解SQL Parser、常见算子的实现(如SQL中table scan,join,aggregation)、资源管理与调度、查询优化(如向量化执行、动态代码生成)、大数据下各个组件为何适用不同场景等等都有帮助。我希望通过这个系列可以了解一条SQL在大数据场景下该如何高效执行。233酱准备不定时持续更新这个系列,本文主要从Presto的使用举例,Presto的应用场景、Presto的基本概念三个部分来初步介绍Presto。 Presto的使用举例 比如说,你想对存储在不同数据源中的数据,如HDFS、Mysql、HBase等通过一个SQL做查询分析,那么只需要把每一个数据源当成是Presto的Connector,对应实现Presto SPI暴露出的Connector API就可以了。 hbase 和 es 的Join查询举例 Presto官方版 和 Presto社区版 已经支持了很多Connector,社区版略胜一筹。至于两者有何区别,吃瓜群众可以前往文末参考资料[4]。简而言之,都主要由Facebook那帮大佬核心维护。社区版更新更为频繁,但高版本需要JDK11才能支持;官方版JDK8就行

Presto

↘锁芯ラ 提交于 2021-02-01 08:03:35
将时间戳转字符串 format_datetime(from_unixtime(time / 1000), 'yyyy-MM-dd HH:mm:ss') 多多使用WITH语句 使用Presto分析统计数据时,可考虑把多次查询合并为一次查询,用Presto提供的子查询完成。 这点和我们熟知的MySQL的使用不是很一样。例如: WITH subquery_1 AS ( SELECT a1, a2, a3 FROM Table_1 WHERE a3 between 20180101 and 20180131 ), /*子查询subquery_1,注意:多个子查询需要用逗号分隔*/ subquery_2 AS ( SELECT b1, b2, b3 FROM Table_2 WHERE b3 between 20180101 and 20180131 ) /*最后一个子查询后不要带逗号,不然会报错。*/ SELECT subquery_1.a1, subquery_1.a2, subquery_2.b1, subquery_2.b2 FROM subquery_1 JOIN subquery_2 ON subquery_1.a3 = subquery_2.b3; 查询SQL优化 只选择使用必要的字段 由于采用列式存储,选择需要的字段可加快字段的读取、减少数据量。避免采用*读取所有字段。

Is there way to join two tables in SQL not on you index row without getting rid of null values

丶灬走出姿态 提交于 2021-01-29 20:43:31
问题 I have two tables in SQL that look similar to the following: Code Symbol Value 1203 ABC 10.00 1208 XYZ 12.00 1222 null 9.00 1226 ABC 1.00 and Symbol Date ABC 2020-06-07 XYZ 2020-06-08 QRS 2020-06-10 Currently, I am trying to join them as follows SELECT a.Code, a.Symbol, a.Value, b.Date FROM table1 a LEFT JOIN table2 b ON a.Symbol = b.Symbol This returns the following output: Code Symbol Value Date 1203 ABC 10.00 2020-06-07 1208 XYZ 12.00 2020-06-08 1226 ABC 1.00 2020-06-07 However, I would

AWS Athena row cast fails when key is a reserved keyword despite double quotes

社会主义新天地 提交于 2021-01-29 18:55:57
问题 I'm working with data in AWS Athena, and I'm trying to match the structure of some input data. This involves a nested structure where "from" is a key. This consistently throws errors. I've narrowed the issue down to the fact that Athena queries don't work when you try to use reserved keywords as keys in rows. The following examples demonstrate this behavior. This simple case, SELECT CAST(ROW(1) AS ROW("from" INTEGER)) , fails with the following error: GENERIC_INTERNAL_ERROR: Unable to create

Special characters in AWS Athena show up as question marks

我的梦境 提交于 2021-01-29 11:08:42
问题 I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL: CREATE EXTERNAL TABLE `regions_dk`( `postnummer` string COMMENT 'from deserializer', `kommuner` string COMMENT 'from deserializer', `regioner` string COMMENT 'from deserializer') ROW

Presto Unnest Array Inside Array

拥有回忆 提交于 2021-01-29 09:44:51
问题 I have following query running on Big query and it's working fine: select item_detail.location.zone, count(*) FROM table t CROSS JOIN UNNEST(items) as item_detail group by 1; But when i running the same query in Presto it's giving me an error it's because my items structure is like this Arrays of Arrays so i modified my query like this: SELECT location.zone FROM table CROSS JOIN UNNEST(items) as t(item, quantity,location); But it's throwing me this error: Presto error: Unhandled type for

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

ぃ、小莉子 提交于 2021-01-29 01:51:50
问题 I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table presto:sample> select * from emp_detail; Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

谁都会走 提交于 2021-01-29 01:43:44
问题 I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table presto:sample> select * from emp_detail; Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part