aggregate-functions

Distributed for loop in pyspark dataframe

大城市里の小女人 提交于 2020-01-15 12:17:28
问题 Context : My company is in Spark 2.2 so it's not possible to use pandas_udf for distributed column processing I have dataframes that contain thousands of columns(features) and millions of records df = spark.createDataFrame([(1,"AB", 100, 200,1), (2, "AC", 150,200,2), (3,"AD", 80,150,0)],["Id","Region","Salary", "HouseHoldIncome", "NumChild"]) I would like to perform certain summary and statistics on each column in a parallel manner and wonder what is the best way to achieve this #The point is

PostgreSQL aggregate function over range

不想你离开。 提交于 2020-01-15 11:53:10
问题 I am trying to create a function that will find the intersection of tsrange , but I can't get it work: CREATE AGGREGATE intersection ( tsrange ) ( SFUNC = *, STYPE = tsrange ) 回答1: There are two modifications to your attempt. First, I don't think you can use an operator as the SFUNC, so you need to define a named function to do the intersection, and use that. CREATE or REPLACE FUNCTION int_tsrange(a tsrange, b tsrange) returns tsrange language plpgsql as 'begin return a * b; end'; Secondly,

Percentile calculation with a window function

瘦欲@ 提交于 2020-01-15 08:33:45
问题 I know you can get the average, total, min, and max over a subset of the data using a window function. But is it possible to get, say, the median, or the 25th percentile instead of the average with the window function? Put another way, how do I rewrite this to get the id and the 25th or 50th percentile sales numbers within each district rather than the average? SELECT id, avg(sales) OVER (PARTITION BY district) AS district_average FROM t 回答1: You can write this as an aggregation function

Update with a Join, Group By, and Having

房东的猫 提交于 2020-01-15 05:32:24
问题 The select statement executes with no errors or warning. The update statement throws an error: Incorrect syntax near the keyword 'group'. select [sSVsys].[textUniqueWordCount], count(*) as [actCount] from [docSVsys] as [sSVsys]with (nolock) join [FTSindexWordOnce] with (nolock) on [sSVsys].[sID] = [FTSindexWordOnce].[sID] where [sSVsys].[sID] < 500000 group by [sSVsys].[sID], [sSVsys].[textUniqueWordCount] having [sSVsys].[textUniqueWordCount] <> count(*) update [sSVsys] set [sSVsys].

linq aggregated nested count

风格不统一 提交于 2020-01-15 05:14:10
问题 i have the following classes: class Outer { public ICollection<Inner> Inners } class Inner { public ICollection<Inner> Inners } I would like to order descending a list of outers by the total count of their Inners and nested Inners. for example: if i have 2 outers: the first has a collection of 3 inners, each with 1 nested inner then total is 5. the second has for example can have a collection of 2 inners, each with 3 nested inner then the total count is 2 + 3 + 3 = 8 therefor in the returned

How to select minimum UUID with left outer join?

杀马特。学长 韩版系。学妹 提交于 2020-01-14 22:46:37
问题 I'm trying to select a row from a table which: has a minimum UUID is not referenced in another table But I'm having problems when I try to enforce the first constraint. Here's everything working as expected on integers: First, create tables that look like this: t1 +----+---------+ | id | content | +----+---------+ | 1 | a | | 2 | b | | 3 | c | +----+---------+ and t2 +----+---------+ | id | t1_id | +----+---------+ | 1 | 1 | +----+---------+ postgres=# create table t1(id int, content varchar

Postgres: get min, max, aggregate values in one select

南楼画角 提交于 2020-01-14 03:47:07
问题 I am using Postgresql 8.4. I have a table like this: type | value ------+------- 1 | 5 2 | 6 1 | 4 3 | 10 I want to write one select that will give me the minimum & maximum value, and an aggregate of all the types as integer[] . The desired result should be: min | max | types -----+-----+----------- 4 | 10 | {1, 2, 3} To get the min and max, I already have: SELECT MIN(value) min, MAX(value) max FROM table; To get the types in a standalone select, I use: SELECT array_agg(DISTINCT type) types

How to get definition/source code of an aggregate in PostgreSQL?

浪尽此生 提交于 2020-01-13 11:34:00
问题 I found this related answer useful: Export "Create Aggregate" functions from PostgreSQL But how do I get the CREATE AGGREGATE statement without a GUI client (e.g. with psql command line)? 回答1: Something like this, but I'm not sure if this covers all possible ways of creating an aggregate (it definitely does not take the need for quoted identifiers into account) SELECT 'create aggregate '||n.nspname||'.'||p.proname||'('||format_type(a.aggtranstype, null)||') (sfunc = '||a.aggtransfn ||', stype

How to get definition/source code of an aggregate in PostgreSQL?

[亡魂溺海] 提交于 2020-01-13 11:32:11
问题 I found this related answer useful: Export "Create Aggregate" functions from PostgreSQL But how do I get the CREATE AGGREGATE statement without a GUI client (e.g. with psql command line)? 回答1: Something like this, but I'm not sure if this covers all possible ways of creating an aggregate (it definitely does not take the need for quoted identifiers into account) SELECT 'create aggregate '||n.nspname||'.'||p.proname||'('||format_type(a.aggtranstype, null)||') (sfunc = '||a.aggtransfn ||', stype

Find row with maximum value of id in MySQL

自闭症网瘾萝莉.ら 提交于 2020-01-13 10:53:51
问题 Take a look at the MySQL table below called "Articles": +----+-----------+---------+------------------------+--------------------------+ | id | articleId | version | title | content | +----+-----------+---------+------------------------+--------------------------+ | 1 | 1 | 0.0 | ArticleNo.1 title v0.0 | ArticleNo.1 content v0.0 | | 2 | 1 | 1.0 | ArticleNo.1 title v1.0 | ArticleNo.1 content v1.0 | | 3 | 1 | 1.5 | ArticleNo.1 title v1.5 | ArticleNo.1 content v1.5 | | 4 | 1 | 2.0 | ArticleNo.1