Cumulative adding with dynamic base in Postgres

為{幸葍}努か 提交于 2020-08-05 04:37:25

问题


I have the following scenario in Postgres (I'm using 9.4.1).

I have a table of this format:

create table test(
    id serial,
    val numeric not null,
    created timestamp not null default(current_timestamp),
    fk integer not null
);

What I then have is a threshold numeric field in another table which should be used to label each row of test. For every value which is >= threshold I want to have that record marked as true but if it is true it should reset subsequent counts to 0 at that point, e.g.

Data set:

insert into test(val, created, fk)
  (100, now() + interval '10 minutes', 5),
  (25,  now() + interval '20 minutes', 5),
  (30,  now() + interval '30 minutes', 5),
  (45,  now() + interval '40 minutes', 5),
  (10,  now() + interval '50 minutes', 5);

With a threshold of 50 I would like to get the output as:

100 -> true (as 100 > 50) [reset]
25  -> false (as 25 < 50)
30  -> true (as 25 + 30 > 50) [reset]
45  -> false (as 45 < 50)
10  -> true (as 45 + 10 > 50)

Is it possible to do this in a single SQL query? So far I have experimented with using a window function.

select t.*,
       sum(t.val) over (
         partition by t.fk order by t.created
       ) as threshold_met
from test t
where t.fk = 5;

As you can see I have got it to the point where I have a cumulative frequency and suspect that the tweaking of rows between x preceding and current row may be what I'm looking for. I just can't work out how to perform the reset, i.e. set x, in the above to the appropriate value.


回答1:


Create your own aggregate function, which can be used as window function.

Specialized aggregate function

It's easier than one might think:

CREATE OR REPLACE FUNCTION f_sum_cap50 (numeric, numeric)
  RETURNS numeric LANGUAGE sql AS
'SELECT CASE WHEN $1 > 50 THEN 0 ELSE $1 END + $2';

CREATE AGGREGATE sum_cap50 (numeric) (
  sfunc    = f_sum_cap50
, stype    = numeric
, initcond = 0
);

Then:

SELECT *, sum_cap50(val) OVER (PARTITION BY fk
                               ORDER BY created) > 50 AS threshold_met 
FROM   test
WHERE  fk = 5;

Result exactly as requested.

db<>fiddle here
Old sqlfiddle

Generic aggregate function

To make it work for any thresholds and any (numeric) data type, and also allow NULL values:

CREATE OR REPLACE FUNCTION f_sum_cap (anyelement, anyelement, anyelement)
  RETURNS anyelement
  LANGUAGE sql STRICT AS
$$SELECT CASE WHEN $1 > $3 THEN '0' ELSE $1 END + $2;$$;

CREATE AGGREGATE sum_cap (anyelement, anyelement) (
  sfunc    = f_sum_cap
, stype    = anyelement
, initcond = '0'
);

Then, to call with a limit of, say, 110 with any numeric type:

SELECT *
     , sum_cap(val, '110') OVER (PARTITION BY fk
                                 ORDER BY created) AS capped_at_110
     , sum_cap(val, '110') OVER (PARTITION BY fk
                                 ORDER BY created) > 110 AS threshold_met 
FROM   test
WHERE  fk = 5;

db<>fiddle here
Old sqlfiddle

Explanation

In your case we don't have to defend against NULL values since val is defined NOT NULL. If NULL can be involved, define f_sum_cap() as STRICT and it works because (per documentation):

If the state transition function is declared "strict", then it cannot be called with null inputs. With such a transition function, aggregate execution behaves as follows. Rows with any null input values are ignored (the function is not called and the previous state value is retained) [...]

Both function and aggregate take one more argument. For the polymorphic variant it can be a hard coded data type or the same polymorphic type as the leading arguments.

About polymorphic functions:

  • Initial array in function to aggregate multi-dimensional array

Note the use of untyped string literals, not numeric literals, which would default to integer!



来源:https://stackoverflow.com/questions/29417300/cumulative-adding-with-dynamic-base-in-postgres

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!