Count the number of attributes that are NULL for a row

柔情痞子 提交于 2021-01-27 04:38:34

问题


I want to add a new column to a table to record the number of attributes whose value are null for each tuple (row). How can I use SQL to get the number?

for example, if a tuple is like this:

Name | Age | Sex
-----+-----+-----
Blice| 100 | null

I want to update the tuple as this:

Name | Age | Sex | nNULL
-----+-----+-----+--------
Blice| 100 | null|  1

Also, because I'm writing a PL/pgSQL function and the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name. Anyone know how to do this?


回答1:


Possible without spelling out columns. Counter-pivot columns to rows and count.

The aggregate function count(<expression>) only counts non-null values, while count(*) counts all rows. The shortest and fastest way to count NULL values for more than a few columns is count(*) - count(col) ...

Works for any table with any number of columns of any data types.

In Postgres 9.3+ with built-in JSON functions:

SELECT *, (SELECT count(*) - count(v)
           FROM json_each_text(row_to_json(t)) x(k,v)) AS ct_nulls
FROM   tbl t;

What is x(k,v)?

json_each_text() returns a set of rows with two columns. Default column names are key and value as can be seen in the manual where I linked. I provided table and column aliases so we don't have to rely on default names. The second column is named v.

Or, in any Postgres version since at least 8.3 with the additional module hstore installed, even shorter and a bit faster:

SELECT *,  (SELECT count(*) - count(v) FROM svals(hstore(t)) v) AS ct_nulls
FROM   tbl t;

This simpler version only returns a set of single values. I only provide a simple alias v, which is automatically taken to be table and column alias.

  • Best way to install hstore on multiple schemas in a Postgres database?

Since the additional column is functionally dependent I would consider not to persist it in the table at all. Rather compute it on the fly like demonstrated above or create a tiny function with a polymorphic input type for the purpose:

CREATE OR REPLACE FUNCTION f_ct_nulls(_row anyelement)
  RETURNS int  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (count(*) - count(v))::int FROM svals(hstore(_row)) v';

(PARALLEL SAFE only for Postgres 9.6 or later.)

Then:

SELECT *, f_ct_nulls(t) AS ct_nulls
FROM   tbl t;

You could wrap this into a VIEW ...

SQL Fiddle demonstrating all.

This should also answer your second question:

... the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name.




回答2:


In Postgres, you can express this as:

select t.*,
       ((name is null)::int +
        (age is null)::int +
        (sex is null)::int
       ) as numnulls
from table t;

In order to implement this on an unknown table, you will need to use dynamic SQL and obtaining a list of columns (say from information_schema.columns)).




回答3:


Function to add column automatically

This is an audited version of what @winged panther posted, per request.

The function adds a column with given name to any existing table that the calling role has the necessary privileges for:

CREATE OR REPLACE FUNCTION f_add_null_count(_tbl regclass, _newcol text)
  RETURNS void AS
$func$
BEGIN
   -- add new col
   EXECUTE format('ALTER TABLE %s ADD COLUMN %I smallint', _tbl, _newcol);

   -- update new col with dynamic count of nulls
   EXECUTE (
      SELECT format('UPDATE %s SET %I = (', _tbl, _newcol)  -- regclass used as text
          || string_agg(quote_ident(attname), ' IS NULL)::int + (')
          || ' IS NULL)::int'
      FROM   pg_catalog.pg_attribute
      WHERE  attnum > 0
      AND    NOT attisdropped
      AND    attrelid = _tbl  -- regclass used as OID
      AND    attname <> _newcol  -- no escaping here, it's the *text*!
      );
END
$func$  LANGUAGE plpgsql;

SQL Fiddle demo.

How to treat identifiers properly

  • Sanitize identifiers with cast to regclass, format() with %I or quote_ident(). I am using all three techniques in the example, each happens to be the best choice where they are used. More here:
    • Table name as a PostgreSQL function parameter

I formatted the relevant code fragments in bold.

Other points

  • I am basing my query on pg_catalog.pg_attribute, but that's a optional decision with pros and cons. Makes my query simpler and faster because I can use the OID of the table. Related:

    • How to check if a table exists in a given schema
    • Select columns with particular column names in PostgreSQL
  • You have to exclude the newly added column from the count, or the count will be off by one.

  • Using data type smallint for the count, since there cannot more than 1600 columns in a table.

  • I don't use a variable but execute the result of the SELECT statement directly. Assignments are comparatively expensive in plpgsql. Not a big deal, though. Also a matter of taste and style.

  • I make it a habbit to prepend parameters and variable with an underscore (_tbl) to rule out ambiguity between variables and column names.




回答4:


I just created a function to perform OP's requirement by using Gordon Linoff's answer with following table and data:

Table det:

CREATE TABLE det (
  name text,
  age integer,
  sex text
);

Data:

insert into det (name,age,sex) values
  ('Blice',100,NULL),
  ('Glizz',NULL,NULL),
  (NULL,NULL,NULL);

Function:

create or replace function fn_alter_nulls(tbl text,new_col text)  returns void as 
$$
declare vals text;
begin
   -- dynamically getting list of columns *
select string_agg(format('(%s is null)::int',column_name),'+') into vals
from information_schema.columns 
where table_schema='public' and table_name=''||tbl||''  and table_catalog='yourDB_Name';
-- adds new column
execute format('alter table %s add column "%s" int',tbl,new_col);
--updates new column
execute format('update det set %s =(%s)',new_col,vals);
end;
$$
language plpgsql

Function call:

select fn_alter_nulls('det','nnulls')



回答5:


Since the null count is derived data and simple/cheap to determine at query time, why not create a view:

create view MyTableWithNullCount as
select
  *, 
  case when nullableColumn1 is null then 1 else 0 end +
  case when nullableColumn2 is null then 1 else 0 end +
  ...
  case when nullableColumnn is null then 1 else 0 end as nNull
from myTable

And just use the view instead.

This has the upside of not having to write triggers/code to maintain a physical null count column, which will be a bigger headache than this approach.



来源:https://stackoverflow.com/questions/31444591/count-the-number-of-attributes-that-are-null-for-a-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!