aggregate-functions

Is it possible to compute “distinct sum” and “distinct average” in elasticsearch?

岁酱吖の 提交于 2020-05-30 08:05:09
问题 How can I calculate a "distinct average" in elasticsearch? I have some denormalized data like this: { "record_id" : "100", "cost" : 42 } { "record_id" : "200", "cost" : 67 } { "record_id" : "200", "cost" : 67 } { "record_id" : "200", "cost" : 67 } { "record_id" : "400", "cost" : 11 } { "record_id" : "400", "cost" : 11 } { "record_id" : "500", "cost" : 10 } { "record_id" : "600", "cost" : 99 } Notice how the "cost" is always the same for a given "record_id". So with the above data: How can I

Grouped table of percentiles [duplicate]

本小妞迷上赌 提交于 2020-05-17 06:44:47
问题 This question already has answers here : ddply multiple quantiles by group (4 answers) Closed 4 months ago . I need to calculate which value represents the 5%, 34%, 50%, 67% and 95% percentile within the group (in separate columns). An expected output would be 5% 34% 50% 67% 95% A 4 6 8 12 30 B 1 2 3 4 10 for integer values for each group. The code below shows what I have so far (but using generated data): library(dplyr) library(tidyr) data.frame(group=sample(LETTERS[1:5],100,TRUE),values

Complex nested aggregations to get order totals

限于喜欢 提交于 2020-05-17 03:03:35
问题 I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly. The expenditures table look like this: +----+----------+-----------+------------------------+ | id | category | parent_id | note | +----+----------+-----------+------------------------+ | 1 | order | nil | order with no invoices | +----+----------+-----------+------------------------+ | 2 | order | nil | order

Complex nested aggregations to get order totals

元气小坏坏 提交于 2020-05-17 03:01:35
问题 I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly. The expenditures table look like this: +----+----------+-----------+------------------------+ | id | category | parent_id | note | +----+----------+-----------+------------------------+ | 1 | order | nil | order with no invoices | +----+----------+-----------+------------------------+ | 2 | order | nil | order

Aggregate a Spark data frame using an array of column names, retaining the names

∥☆過路亽.° 提交于 2020-04-13 17:21:51
问题 I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns. df.groupBy($"id").sum(colNames:_*) This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this: df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*)) error: no `: _*' annotation allowed here It works to take a single element like df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2))) How can

Can I use aggregate expressions using SQLAlchemy?

本小妞迷上赌 提交于 2020-03-16 06:16:08
问题 PostgreSQL have aggregate expressions, e.g. count(*) FILTER (WHERE state = 'success') . How can I generate such expressions using SQLAlchemy? 回答1: Suppose I have a model Machine with a boolean field active , and would like to filter the count by active = true Using func.count(...).filter(...) from models import db, Machine from sqlalchemy.sql import func query = db.session.query( func.count(Machine.id).filter(Machine.active == True) .label('active_machines') ) We can look at the generated SQL

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

守給你的承諾、 提交于 2020-02-21 15:03:46
问题 I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also. In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days. Original Query: SELECT COUNT(id) AS 'Past-Month-Builds', CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day' FROM builds WHERE DATE(submittime) >=

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

喜你入骨 提交于 2020-02-21 14:33:46
问题 I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also. In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days. Original Query: SELECT COUNT(id) AS 'Past-Month-Builds', CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day' FROM builds WHERE DATE(submittime) >=

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

杀马特。学长 韩版系。学妹 提交于 2020-02-21 14:33:12
问题 I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also. In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days. Original Query: SELECT COUNT(id) AS 'Past-Month-Builds', CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day' FROM builds WHERE DATE(submittime) >=

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

↘锁芯ラ 提交于 2020-02-21 14:32:49
问题 I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also. In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days. Original Query: SELECT COUNT(id) AS 'Past-Month-Builds', CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day' FROM builds WHERE DATE(submittime) >=