PostgreSQL create a new column with values conditioned on other columns

倾然丶 夕夏残阳落幕 提交于 2020-01-11 02:07:10

问题


I use PostgreSQL 9.1.2 and I have a basic table as below, where I have the Survival status of an entry as a boolean (Survival) and also in number of days (Survival(Days)).

I have manually added a new column named 1-yr Survival and now I want to fill in the values of this column for each entry in the table, conditioned on that entry's Survival and Survival (Days) column values. Once , completed the database table would look something like this:

Survival    Survival(Days)    1-yr Survival
----------  --------------    -------------
Dead            200                NO
Alive            -                 YES
Dead            1200               YES

The pseudo code to input the conditioned values of 1-yr Survival would be something like:

ALTER TABLE mytable ADD COLUMN "1-yr Survival" text
for each row
if ("Survival" = Dead & "Survival(Days)" < 365) then Update "1-yr Survival" = NO
else Update "1-yr Survival" = YES
end 

I believe this is a basic operation however I failed to find the postgresql syntax to execute it. Some search results return "adding a trigger", but I am not sure that is what I neeed. I think my situation here is a lot simpler. Any help/advice would be greatly appreciated.


回答1:


The one-time operation can be achieved with a plain UPDATE:

UPDATE tbl
SET    one_year_survival = (survival OR survival_days >= 365);

I would advise not to use camel-case, white-space and parenthesis in your names. While allowed between double-quotes, it often leads to complications and confusion. Consider the chapter about identifiers and key words in the manual.

Are you aware that you can export the results of a query as CSV with COPY?
Example:

COPY (SELECT *, (survival OR survival_days >= 365) AS one_year_survival FROM tbl)
TO '/path/to/file.csv';

You wouldn't need the redundant column this way to begin with.


Additional answer to comment

To avoid empty updates:

UPDATE tbl
SET    "Dead after 1-yr" = (dead AND my_survival_col < 365)
      ,"Dead after 2-yrs" = (dead AND my_survival_col < 730)
....
WHERE  "Dead after 1-yr" IS DISTINCT FROM (dead AND my_survival_col < 365)
   OR  "Dead after 2-yrs" IS DISTINCT FROM (dead AND my_survival_col < 730)
...

Personally, I would only add such redundant columns if I had a compelling reason. Normally I wouldn't. If it's about performance: are you aware of indexes on expressions and partial indexes?




回答2:


Honestly, I think you are better off not storing data in the db which is quickly and easily calculated from stored data. A better option would be to simulate a calculated field (gotchas noted below however). In this case you would 9changing spaces etc to underscores for easier maintenance:

CREATE FUNCTION one_yr_survival(mytable)
RETURNS BOOL
IMMUTABLE
LANGUAGE SQL AS $$
select $1.survival OR $1.survival_days >= 365;
$$;

then you can actually:

SELECT *, m.one_year_survival from mytable m;

and it will "just work." Note the following gotchas:

  • mytable.1_year_survival will not be returned by the default column list, and
  • you cannot omit the table identifier (m in the above example) because the parser converts this into one_year_survival(m).

However the benefit is that the value can be proven never to get out of sync with the other values. Otherwise you end up with a rats nest of check constraints.

You can actually take this approach quite far. See http://ledgersmbdev.blogspot.com/2012/08/postgresql-or-modelling-part-2-intro-to.html



来源:https://stackoverflow.com/questions/12184409/postgresql-create-a-new-column-with-values-conditioned-on-other-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!