可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Not sure about the correct words to ask this question, so I will break it down.
I have a table as follows:
date_time | a | b | c
Last 4 rows:
15/10/2013 11:45:00 | null | 'timtim' | 'fred' 15/10/2013 13:00:00 | 'tune' | 'reco' | null 16/10/2013 12:00:00 | 'abc' | null | null 16/10/2013 13:00:00 | null | 'died' | null
How would I get the last record but with the value ignoring the null and instead get the value from the previous record.
In my provided example the row returned would be
16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'
As you can see if the value for a column is null then it goes to the last record which has a value for that column and uses that value.
This should be possible, I just cant figure it out. So far I have only come up with:
select last_value(a) over w a from test WINDOW w AS ( partition by a ORDER BY ts asc range between current row and unbounded following );
But this only caters for a single column ...
回答1:
This should work but keep in mind it is an uggly solution
select * from (select dt from (select rank() over (order by ctid desc) idx, dt from sometable ) cx where idx = 1) dtz, ( select a from (select rank() over (order by ctid desc) idx, a from sometable where a is not null ) ax where idx = 1) az, ( select b from (select rank() over (order by ctid desc) idx, b from sometable where b is not null ) bx where idx = 1) bz, ( select c from (select rank() over (order by ctid desc) idx, c from sometable where c is not null ) cx where idx = 1) cz
See it here at fiddle: http://sqlfiddle.com/#!15/d5940/40
The result will be
DT A B C October, 16 2013 00:00:00+0000 abc died fred
回答2:
Here I create an aggregation function that collects columns into arrays. Then it is just a matter of removing the NULLs and selecting the last element from each array.
Sample Data
CREATE TABLE T ( date_time timestamp, a text, b text, c text ); INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'), ('2013-10-15 13:00:00', 'tune', 'reco', NULL ), ('2013-10-16 12:00:00', 'abc', NULL, NULL ), ('2013-10-16 13:00:00', NULL, 'died', NULL );
Solution
CREATE AGGREGATE array_accum (anyelement) ( sfunc = array_append, stype = anyarray, initcond = '{}' ); WITH latest_nonull AS ( SELECT MAX(date_time) As MaxDateTime, array_remove(array_accum(a), NULL) AS A, array_remove(array_accum(b), NULL) AS B, array_remove(array_accum(c), NULL) AS C FROM T ORDER BY date_time ) SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)] FROM latest_nonull;
Result
maxdatetime | a | b | c ---------------------+-----+------+------ 2013-10-16 13:00:00 | abc | died | fred (1 row)
回答3:
Order of rows
The "last row" and the sort order would need to be defined unambiguously. There is no natural order in a set (or a table). I am assuming ORDER BY ts
, where ts is the timestamp column.
Like @Jorge pointed out in his comment: If ts
is not UNIQUE
, one needs to define tiebreakers for the sort order to make it unambiguous (add more items to ORDER BY
). A primary key would be the ultimate solution.
General solution with window functions to get the result for every row in the table
SELECT ts ,max(a) OVER (PARTITION BY grp_a) AS a ,max(b) OVER (PARTITION BY grp_b) AS b ,max(c) OVER (PARTITION BY grp_c) AS c FROM ( SELECT * ,count(a) OVER (ORDER BY ts) AS grp_a ,count(b) OVER (ORDER BY ts) AS grp_b ,count(c) OVER (ORDER BY ts) AS grp_c FROM t ) sub;
How?
The aggregate function count()
ignores NULL values when counting. Used as an aggregate-window function, it computes the running count of a the column according to the default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. This results in the count being "stuck" for rows with NULL values, thereby forming a peer group that should share the same (non-null) value.
In a second window function, the only non-null value per group is easily extracted with max()
.
Just the last row
WITH cte AS ( SELECT * ,count(a) OVER w AS grp_a ,count(b) OVER w AS grp_b ,count(c) OVER w AS grp_c FROM t WINDOW w AS (ORDER BY ts) ) SELECT ts ,max(a) OVER (PARTITION BY grp_a) AS a ,max(b) OVER (PARTITION BY grp_b) AS b ,max(c) OVER (PARTITION BY grp_c) AS c FROM cte ORDER BY ts DESC LIMIT 1;
Simple alternatives for just the last row
SELECT ts ,COALESCE(a, (SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a ,COALESCE(b, (SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b ,COALESCE(c, (SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c FROM t ORDER BY ts DESC LIMIT 1; SELECT (SELECT ts FROM t ORDER BY ts DESC LIMIT 1) AS ts ,(SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a ,(SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b ,(SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c
-> SQLfiddle
Performance
While this should be decently fast, if performance is your paramount requirement, I would use a plpgsql function. Start with the last row and loop descending until you have a non-null value for every column required. Along these lines:
GROUP BY and aggregate sequential numeric values