presto

Cannot connect/query from Presto on AWS EMR with Java JDBC

南笙酒味 提交于 2021-01-28 12:22:20
问题 If I ssh onto the master node of my presto emr cluster, I can run queries. However, I would like to be able to run queries from java source code on my local machine that connects to the emr cluster. I set up my presto emr cluster with default configurations. I have tried port forwarding, but it still does not seem to work. When I create the connection, I print it out and it is "com.facebook.presto.jdbc.PrestoConnection@XXXXXXX" but I still have doubts if it is actually connected since I can't

Spark Small ORC Stripes

跟風遠走 提交于 2021-01-28 11:58:32
问题 We use Spark to flatten out clickstream data and then write the same to S3 in ORC+zlib format, I have tried changing many settings in Spark but still the resultant stripe sizes of the ORC file getting created are very small (<2MB) Things which I tried so far to decrease the stripe size, Earlier each file was 20MB in size, using coalesce I am now creating files which are of 250-300MB in size, but still there are 200 stripes per file i.e each stripe <2MB Tried using hivecontext instead of

Athena: $path vs. partition

不想你离开。 提交于 2021-01-28 10:55:47
问题 I'm storing daily reports per client for query with Athena. At first I thought I'd use a client=c_1/month=12/day=01/ or client=c2/date=2020-12-01/ folder structure, and run MSCK REPAIR TABLE daily to make new day partition available for query. Then I realized there's the $path special column, so if I store files as 2020-12-01.csv I could run a query with WHERE $path LIKE '%12-01% thus saving a partition and the need to detect/add it daily. I can see this having an impact on performance if

Effective way in PRESTO to Result output with Boolean values

会有一股神秘感。 提交于 2021-01-28 10:25:47
问题 Im trying to create a output of boolen values on back of some conditions. OutPut: EX: I have 3 rules/conditions from different tables which are not related to each other. Rule 1: Select USER_NAME, ID from session_user where age > 25 Rule 2: Select USER_NAME, ID from current_user where plan = 'gold' Rule 3: Select USER_NAME, ID from customer where group_name='managers' My OutPut Should be: USER_NAME | ID | Rule 1 | Rule 2 | Rule 3 user1 1 true false true user2 2 false true true user3 3 true

Effective way in PRESTO to Result output with Boolean values

自闭症网瘾萝莉.ら 提交于 2021-01-28 10:23:40
问题 Im trying to create a output of boolen values on back of some conditions. OutPut: EX: I have 3 rules/conditions from different tables which are not related to each other. Rule 1: Select USER_NAME, ID from session_user where age > 25 Rule 2: Select USER_NAME, ID from current_user where plan = 'gold' Rule 3: Select USER_NAME, ID from customer where group_name='managers' My OutPut Should be: USER_NAME | ID | Rule 1 | Rule 2 | Rule 3 user1 1 true false true user2 2 false true true user3 3 true

How to show incremental changes in SQL

妖精的绣舞 提交于 2021-01-28 02:22:11
问题 I have a sql (Presto) table that contains data about different kinds of bird species. The table contains information about how often the different species have been observed in a park over different months. Some of the species have not been observed in certain months, but measurements were only performed on 07-9-19 , 08-10-19 , and 05-11-19 . A sample of the data would look like this Species Period Total (Cumulative) Observations (TCO) Bird1 07-9-19 33 Bird1 08-10-19 45 Bird1 05-11-19 60

How do I enforce ordering (ORDER BY) in a custom Presto Aggregation Function

筅森魡賤 提交于 2021-01-28 01:45:34
问题 I am writing a custom Presto Aggregation Function that produces the correct result if (and only if) the values are ordered in ascending order by the value that I am aggregating on. i.e. The following will work: SELECT key, MY_AGG_FUNC(value ORDER BY value ASC) FROM my_table GROUP BY key The following will yield an incorrect result: SELECT key, MY_AGG_FUNC(value) FROM my_table GROUP BY key When developing the MY_AGG_FUNC , is there a way to enforce ORDER BY value ASC internally without relying

Add rows between two dates Presto

若如初见. 提交于 2021-01-23 11:05:51
问题 I have a table that has 3 columns- start, end and emp_num. I want to generate a new table which has all dates between these dates for every employee. Need to use Presto. I refered this link - inserting dates into a table between a start and end date in Presto Tried using unnest function by creating sequence but , I don't know how do I create sequence by pulling dates from two columns in another table. select unnest(seq) as t(days) from (select sequence(start, end, interval '1' day) as seq

Add rows between two dates Presto

社会主义新天地 提交于 2021-01-23 11:04:32
问题 I have a table that has 3 columns- start, end and emp_num. I want to generate a new table which has all dates between these dates for every employee. Need to use Presto. I refered this link - inserting dates into a table between a start and end date in Presto Tried using unnest function by creating sequence but , I don't know how do I create sequence by pulling dates from two columns in another table. select unnest(seq) as t(days) from (select sequence(start, end, interval '1' day) as seq

Presto SQL - How can i get all possible combination of an array?

空扰寡人 提交于 2021-01-22 09:31:43
问题 I want all the possible combination of a number in a given set of array. I tried using some of the predefined functions of presto like array_agg(x) Input : [1,2,3,4] Output when n=2 : [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]] when n=3 : [[1,2,3],[1,2,4],[1,3,4],[2,3,4]] when n=4 : [[1,2,3,4]] or [1,2,3,4] 回答1: There is combinations(array(T), n) function and it does exactly what you want: select combinations(array[1,2,3,4],2); 来源: https://stackoverflow.com/questions/56540393/presto-sql-how-can-i