Retrieve last known value for each column of a row

匿名 (未验证) 提交于 2019-12-03 02:31:01

问题:

Not sure about the correct words to ask this question, so I will break it down.

I have a table as follows:

date_time | a | b | c 

Last 4 rows:

15/10/2013 11:45:00 | null   | 'timtim' | 'fred' 15/10/2013 13:00:00 | 'tune' | 'reco'   | null 16/10/2013 12:00:00 | 'abc'  | null     | null 16/10/2013 13:00:00 | null   | 'died'   | null 

How would I get the last record but with the value ignoring the null and instead get the value from the previous record.

In my provided example the row returned would be

16/10/2013 13:00:00 | 'abc' | 'died' | 'fred' 

As you can see if the value for a column is null then it goes to the last record which has a value for that column and uses that value.

This should be possible, I just cant figure it out. So far I have only come up with:

select      last_value(a) over w a from test WINDOW w AS (     partition by a     ORDER BY ts asc     range between current row and unbounded following     ); 

But this only caters for a single column ...

回答1:

This should work but keep in mind it is an uggly solution

select * from (select dt from (select rank() over (order by ctid desc) idx, dt   from sometable ) cx where idx = 1) dtz, ( select a from (select rank() over (order by ctid desc) idx, a   from sometable where a is not null ) ax  where idx = 1) az, ( select b from (select rank() over (order by ctid desc) idx, b   from sometable where b is not null ) bx  where idx = 1) bz, ( select c from (select rank() over (order by ctid desc) idx, c   from sometable where c is not null ) cx where idx = 1) cz 

See it here at fiddle: http://sqlfiddle.com/#!15/d5940/40

The result will be

DT                                   A        B      C October, 16 2013 00:00:00+0000      abc     died    fred 


回答2:

Here I create an aggregation function that collects columns into arrays. Then it is just a matter of removing the NULLs and selecting the last element from each array.

Sample Data

CREATE TABLE T (     date_time timestamp,     a text,     b text,     c text );  INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'), ('2013-10-15 13:00:00', 'tune', 'reco', NULL  ), ('2013-10-16 12:00:00', 'abc', NULL, NULL     ), ('2013-10-16 13:00:00', NULL, 'died', NULL    ); 

Solution

CREATE AGGREGATE array_accum (anyelement) (     sfunc = array_append,     stype = anyarray,     initcond = '{}' );  WITH latest_nonull AS (     SELECT MAX(date_time) As MaxDateTime,             array_remove(array_accum(a), NULL) AS A,             array_remove(array_accum(b), NULL) AS B,             array_remove(array_accum(c), NULL) AS C     FROM T     ORDER BY date_time ) SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)] FROM latest_nonull; 

Result

     maxdatetime     |  a  |  b   |  c ---------------------+-----+------+------  2013-10-16 13:00:00 | abc | died | fred (1 row) 


回答3:

Order of rows

The "last row" and the sort order would need to be defined unambiguously. There is no natural order in a set (or a table). I am assuming ORDER BY ts, where ts is the timestamp column.
Like @Jorge pointed out in his comment: If ts is not UNIQUE, one needs to define tiebreakers for the sort order to make it unambiguous (add more items to ORDER BY). A primary key would be the ultimate solution.

General solution with window functions to get the result for every row in the table

SELECT ts       ,max(a) OVER (PARTITION BY grp_a) AS a       ,max(b) OVER (PARTITION BY grp_b) AS b       ,max(c) OVER (PARTITION BY grp_c) AS c FROM (    SELECT *          ,count(a) OVER (ORDER BY ts) AS grp_a          ,count(b) OVER (ORDER BY ts) AS grp_b          ,count(c) OVER (ORDER BY ts) AS grp_c    FROM t    ) sub; 

How?

The aggregate function count() ignores NULL values when counting. Used as an aggregate-window function, it computes the running count of a the column according to the default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This results in the count being "stuck" for rows with NULL values, thereby forming a peer group that should share the same (non-null) value.
In a second window function, the only non-null value per group is easily extracted with max().

Just the last row

WITH cte AS (    SELECT *          ,count(a) OVER w AS grp_a          ,count(b) OVER w AS grp_b          ,count(c) OVER w AS grp_c    FROM   t    WINDOW w AS (ORDER BY ts)    )  SELECT ts       ,max(a) OVER (PARTITION BY grp_a) AS a       ,max(b) OVER (PARTITION BY grp_b) AS b       ,max(c) OVER (PARTITION BY grp_c) AS c FROM   cte ORDER  BY ts DESC LIMIT  1; 

Simple alternatives for just the last row

SELECT ts       ,COALESCE(a, (SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a       ,COALESCE(b, (SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b       ,COALESCE(c, (SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c FROM   t ORDER  BY ts DESC LIMIT  1;   SELECT (SELECT ts FROM t                     ORDER BY ts DESC LIMIT 1) AS ts       ,(SELECT a  FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a       ,(SELECT b  FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b       ,(SELECT c  FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c 

-> SQLfiddle

Performance

While this should be decently fast, if performance is your paramount requirement, I would use a plpgsql function. Start with the last row and loop descending until you have a non-null value for every column required. Along these lines:
GROUP BY and aggregate sequential numeric values



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!