aggregate-functions

PostgreSQL 9.3: Display result in specific format using array_agg function

风流意气都作罢 提交于 2020-02-05 13:52:48
问题 I want to show the given records in the following table into the specific format which is shown below in the table. Creating table: Test_1 CREATE TABLE Test_1 ( ColumnA varchar, ColumnB varchar ); Insertion of records: INSERT INTO Test_1 values('A101','B101'),('A102','B102'), ('A103','B103'),('A104','B104'), ('A105','B105'),('A106','B106'), ('A107','B107'),('A108','B108'), ('A109','B109'),('A201','B201'); I want to show the result like this: Expected Result : ColumnA ColumnX -----------------

Aggregate duplicate records by maintaining the order and also include duplicate records

ぐ巨炮叔叔 提交于 2020-02-04 03:57:53
问题 I am trying to solve an interesting problem, it's easy to just do a groupBy for aggregation like sum, count etc. But this problem is slightly different. Let me explain: This is my list of tuples: val repeatSmokers: List[(String, String, String, String, String, String)] = List( ("ID76182", "sachin", "kita MR.", "56308", "1990", "300"), ("ID76182", "KOUN", "Jana MR.", "56714", "1990", "100"), ("ID76182", "GANGS", "SKILL", "27539", "1990", "255"), ("ID76182", "GANGS", "SKILL", "27539", "1990",

Aggregate over column arrays in DataFrame in PySpark?

Deadly 提交于 2020-01-31 18:14:32
问题 Let's say I have the following DataFrame : [Row(user='bob', values=[0.5, 0.3, 0.2]), Row(user='bob', values=[0.1, 0.3, 0.6]), Row(user='bob', values=[0.8, 0.1, 0.1])] I would like to groupBy user and do something like avg(values) where the average is taken over each index of the array values like this: [Row(user='bob', avgerages=[0.466667, 0.233333, 0.3])] How can I do this in PySpark? 回答1: You can expand array and compute average for each index. Python from pyspark.sql.functions import array

Aggregate over column arrays in DataFrame in PySpark?

大兔子大兔子 提交于 2020-01-31 18:14:26
问题 Let's say I have the following DataFrame : [Row(user='bob', values=[0.5, 0.3, 0.2]), Row(user='bob', values=[0.1, 0.3, 0.6]), Row(user='bob', values=[0.8, 0.1, 0.1])] I would like to groupBy user and do something like avg(values) where the average is taken over each index of the array values like this: [Row(user='bob', avgerages=[0.466667, 0.233333, 0.3])] How can I do this in PySpark? 回答1: You can expand array and compute average for each index. Python from pyspark.sql.functions import array

Aggregate over column arrays in DataFrame in PySpark?

半城伤御伤魂 提交于 2020-01-31 18:13:08
问题 Let's say I have the following DataFrame : [Row(user='bob', values=[0.5, 0.3, 0.2]), Row(user='bob', values=[0.1, 0.3, 0.6]), Row(user='bob', values=[0.8, 0.1, 0.1])] I would like to groupBy user and do something like avg(values) where the average is taken over each index of the array values like this: [Row(user='bob', avgerages=[0.466667, 0.233333, 0.3])] How can I do this in PySpark? 回答1: You can expand array and compute average for each index. Python from pyspark.sql.functions import array

MAX() and MAX() OVER PARTITION BY produces error 3504 in Teradata Query

只谈情不闲聊 提交于 2020-01-31 08:52:26
问题 I am trying to produce a results table with the last completed course date for each course code, as well as the last completed course code overall for each employee. Below is my query: SELECT employee_number, MAX(course_completion_date) OVER (PARTITION BY course_code) AS max_course_date, MAX(course_completion_date) AS max_date FROM employee_course_completion WHERE course_code IN ('M910303', 'M91301R', 'M91301P') GROUP BY employee_number This query produces the following error: 3504 : Selected

MAX() and MAX() OVER PARTITION BY produces error 3504 in Teradata Query

萝らか妹 提交于 2020-01-31 08:51:30
问题 I am trying to produce a results table with the last completed course date for each course code, as well as the last completed course code overall for each employee. Below is my query: SELECT employee_number, MAX(course_completion_date) OVER (PARTITION BY course_code) AS max_course_date, MAX(course_completion_date) AS max_date FROM employee_course_completion WHERE course_code IN ('M910303', 'M91301R', 'M91301P') GROUP BY employee_number This query produces the following error: 3504 : Selected

Convert numeric result to percentage with decimal digits

蹲街弑〆低调 提交于 2020-01-25 20:17:31
问题 I have this query: Select count(incidentnumber) as average from incidents Where IncidentStationGround <> firstpumparriving_deployedfromstation; I got a result, it's something like 20,000. But how can I convert this number to a percentage? Plus, I want the results in decimal, can I? 回答1: your query in comment should work cast count to decimal to achieve decimal percentage count(incidentnumber)::decimal*100 回答2: Assuming percentage of the total count: SELECT (count(incidentnumber) FILTER (WHERE

SQL select only rows with max value on a column [duplicate]

狂风中的少年 提交于 2020-01-25 07:07:08
问题 Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted. This question already has answers here : Retrieving the last record in each group - MySQL (27 answers) Closed 10 months ago . I have this table for documents (simplified version here): +------+-------+--------------------------------------+ | id | rev | content | +------+-------+---------------------

SQL select only rows with max value on a column [duplicate]

北慕城南 提交于 2020-01-25 07:05:23
问题 Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted. This question already has answers here : Retrieving the last record in each group - MySQL (27 answers) Closed 10 months ago . I have this table for documents (simplified version here): +------+-------+--------------------------------------+ | id | rev | content | +------+-------+---------------------