aggregate-functions | 易学教程

Is it possible to compute “distinct sum” and “distinct average” in elasticsearch?

阅读更多关于 Is it possible to compute “distinct sum” and “distinct average” in elasticsearch?

问题 How can I calculate a "distinct average" in elasticsearch? I have some denormalized data like this: { "record_id" : "100", "cost" : 42 } { "record_id" : "200", "cost" : 67 } { "record_id" : "200", "cost" : 67 } { "record_id" : "200", "cost" : 67 } { "record_id" : "400", "cost" : 11 } { "record_id" : "400", "cost" : 11 } { "record_id" : "500", "cost" : 10 } { "record_id" : "600", "cost" : 99 } Notice how the "cost" is always the same for a given "record_id". So with the above data: How can I

Grouped table of percentiles [duplicate]

阅读更多关于 Grouped table of percentiles [duplicate]

问题 This question already has answers here : ddply multiple quantiles by group (4 answers) Closed 4 months ago . I need to calculate which value represents the 5%, 34%, 50%, 67% and 95% percentile within the group (in separate columns). An expected output would be 5% 34% 50% 67% 95% A 4 6 8 12 30 B 1 2 3 4 10 for integer values for each group. The code below shows what I have so far (but using generated data): library(dplyr) library(tidyr) data.frame(group=sample(LETTERS[1:5],100,TRUE),values

Complex nested aggregations to get order totals

阅读更多关于 Complex nested aggregations to get order totals

问题 I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly. The expenditures table look like this: +----+----------+-----------+------------------------+ | id | category | parent_id | note | +----+----------+-----------+------------------------+ | 1 | order | nil | order with no invoices | +----+----------+-----------+------------------------+ | 2 | order | nil | order

Complex nested aggregations to get order totals

阅读更多关于 Complex nested aggregations to get order totals

Aggregate a Spark data frame using an array of column names, retaining the names

阅读更多关于 Aggregate a Spark data frame using an array of column names, retaining the names

问题 I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns. df.groupBy($"id").sum(colNames:_*) This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this: df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*)) error: no `: _*' annotation allowed here It works to take a single element like df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2))) How can

Can I use aggregate expressions using SQLAlchemy?

阅读更多关于 Can I use aggregate expressions using SQLAlchemy?

问题 PostgreSQL have aggregate expressions, e.g. count(*) FILTER (WHERE state = 'success') . How can I generate such expressions using SQLAlchemy? 回答1: Suppose I have a model Machine with a boolean field active , and would like to filter the count by active = true Using func.count(...).filter(...) from models import db, Machine from sqlalchemy.sql import func query = db.session.query( func.count(Machine.id).filter(Machine.active == True) .label('active_machines') ) We can look at the generated SQL

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

阅读更多关于 MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

问题 I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also. In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days. Original Query: SELECT COUNT(id) AS 'Past-Month-Builds', CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day' FROM builds WHERE DATE(submittime) >=

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

阅读更多关于 MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

阅读更多关于 MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

阅读更多关于 MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days