PostgreSQL 9.3: Filter in Pivot table query

问题

Well I have this records:

  Employee_Number    Employee_role         Group_Name
  ----------------------------------------------------
  EMP101             C# Developer            Group_1      
  EMP102             ASP Developer           Group_1      
  EMP103             SQL Developer           Group_2      
  EMP104             PLSQL Developer         Group_2      
  EMP101             Java Developer          
  EMP102             Web Developer          
  EMP101             DBA          
  EMP105             DBA          
  EMP106             SQL Developer           Group_3      
  EMP107             Oracle Developer        Group_3      
  EMP101             Oracle Developer        Group_3

Want to show the pivot table for above records in the following format:

 Employee_Number     TotalRoles      TotalGroups       Available     Others     Group_1     Group_2      Group_3
 -----------------------------------------------------------------------------------------------------------------
 EMP101                   4               3                2            2          1                        1
 EMP102                   2               3                1            1          1 
 EMP103                   1               3                1            0                      1
 EMP104                   1               3                1            0                      1
 EMP105                   1               3                0            1
 EMP106                   1               3                1            0                                   1
 EMP107                   1               3                1            0                                   1

For the above result I am using the following script:

SELECT * FROM crosstab(
      $$SELECT grp.*, e.group_name
         , CASE WHEN e.employee_number IS NULL THEN 0 ELSE 1 END AS val
    FROM  (
       SELECT employee_number
        , count(employee_role)::int            AS total_roles
        , (SELECT count(DISTINCT group_name)::int
           FROM   employee
           WHERE  group_name <> '')            AS total_groups
        , count(group_name <> '' OR NULL)::INT AS available                    
        , count(group_name =  '' OR NULL)::int AS others
       FROM   employee
       GROUP  BY employee_number
       ) grp
    LEFT   JOIN employee e ON e.employee_number = grp.employee_number
                  AND e.group_name <> ''         
    ORDER  BY grp.employee_number, e.group_name$$
     ,$$VALUES ('Group_1'),('Group_2'),('Group_3')$$
   ) AS ct (employee_number text
      , total_roles  int
      , total_groups int
      , available    int
      , others       int
      , "Group_1"    int
      , "Group_2"    int
      , "Group_3"    int);

But:Now I want to show the pivot table for the above records by filtering the Group_Name. That means if I want to show the pivot table for the only Group_Name= Group_3 then it has to show only the employee who is only belongs to the Group_Name= Group_3 not other than that.

If I want to see the employee who is belongs to the Group_3 only than it has to show me:

   Employee_Number    total_roles  total_groups   available    others    Group_3
  ------------------------------------------------------------------------------- 
       EMP106             1             3            1            0            1
       EMP107             1             3            1            0            1

Note: As you can see in the first table the employee EMP106 and EMP107 is only belongs to the Group_Name = Group_3. The employee EMP101 is also belong but he also belongs to other groups so should not appear in this table.

回答1:

How to exclude the offending rows:

Select rows which are not present in other table

The crosstab() version adapted:

SELECT * FROM crosstab(
    $$SELECT grp.*, e.group_name
           , CASE WHEN e.employee_number IS NULL THEN 0 ELSE 1 END AS val
      FROM  (
         SELECT employee_number
              , count(employee_role)::int            AS total_roles
              , (SELECT count(DISTINCT group_name)::int
                 FROM   employee
                 WHERE  group_name <> '')            AS total_groups
              , count(group_name <> '' OR NULL)::int AS available
              , count(group_name = '' OR NULL)::int  AS others
         FROM   employee
         GROUP  BY employee_number
         ) grp
      JOIN   employee e USING (employee_number)
      WHERE  e.group_name = 'Group_3'
      AND    NOT EXISTS (
         SELECT 1 FROM employee
         WHERE  employee_number = e.employee_number
         AND    group_name  e.group_name
         )
      ORDER  BY employee_number$$
   ,$$VALUES ('Group_3')$$
   ) AS ct (employee_number text
      , total_roles  int
      , total_groups int
      , available    int
      , others       int
      , "Group_3"    int);

But as you can see, we don't need crosstab() here at all. Simplify to:

SELECT grp.*, 1 AS "Group_3"
FROM  (
   SELECT employee_number
        , count(employee_role)::int            AS total_roles
        , (SELECT count(DISTINCT group_name)::int
           FROM   employee
           WHERE  group_name <> '')            AS total_groups
        , count(group_name <> '' OR NULL)::int AS available
        , count(group_name = '' OR NULL)::int  AS others
   FROM   employee
   GROUP  BY employee_number
   ) grp
JOIN   employee e USING (employee_number)
WHERE  e.group_name = 'Group_3'
AND    NOT EXISTS (
   SELECT 1 FROM employee
   WHERE  employee_number = e.employee_number
   AND    group_name <> e.group_name
   )
ORDER  BY employee_number;

The column "Group_3" is really just noise here, because it is always 1 by definition.

If only a small percentage of rows is selected this way, this version with a LATERAL join should be substantially faster:

SELECT e.employee_number
     , grp.total_roles
     , total.total_groups
     , grp.available
     , grp.others
     , 1 AS "Group_3"
FROM  (
   SELECT employee_number
   FROM   employee e
   WHERE  group_name = 'Group_3'
   AND    NOT EXISTS (
      SELECT 1 FROM employee
      WHERE  employee_number = e.employee_number
      AND    group_name <> e.group_name
      )
   ) e
, LATERAL (
   SELECT count(employee_role)::int            AS total_roles
        , count(group_name <> '' OR NULL)::int AS available
        , count(group_name = '' OR NULL)::int  AS others
   FROM   employee
   WHERE  employee_number = e.employee_number
   GROUP  BY employee_number
   ) grp
,    (
   SELECT count(DISTINCT group_name)::int AS total_groups
   FROM   employee
   WHERE  group_name <> ''
   ) total
ORDER  BY employee_number;

Details for the LATERAL solution and performance:

Optimize GROUP BY query to retrieve latest record per user

Simple, generic solution for any set of groups

Not optimized for performance, but easy to adapt:

<original crosstab query from your question>
WHERE  "Group_3" = 1
AND    "Group_1" IS NULL
AND    "Group_2" IS NULL
AND    "Group_4" IS NULL
AND    others = 0  -- to rule out membership in the "empty" group
-- possibly more ...

来源：https://stackoverflow.com/questions/29145696/postgresql-9-3-filter-in-pivot-table-query

标签

sql

postgresql

pivot-table

postgresql-9.3