PostgreSQL: LEFT JOIN creates blank row

与世无争的帅哥 提交于 2019-12-24 05:30:12

问题


See important new discoveries 1 and 2 at end of this explanation.

I am running Postgres 9.1.3 and am having a weird left join issue.

I have a table named consistent.master with over 2 million rows. It has a column named citation_id, and that column has no nulls. I can verify that with this:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

That returns 0.

Here's where it gets weird: if I LEFT JOIN this table to a temporary table, I get an error that I am trying to insert a null into the citation_id field:

ERROR: null value in column "citation_id" violates not-null constraint SQL state: 23502

Here's the query:

WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

INSERT INTO consistent.masternew (arrest_id, citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom)

SELECT stops.stop, master.citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

I'm scratching my head on this one. If this is a LEFT JOIN, and if consistent.master is the join's left table, how could this query create null values in the citation_id column when there aren't any to begin with?

Here's the SQL code I used to create the table:

CREATE TABLE consistent.masternew
(
  arrest_id character varying(20),
  citation_id character varying(20) NOT NULL,
  defendant_dl character varying(20),
  defendant_dl_state character varying(2),
  defendant_zip character varying(9),
  defendant_race character varying(10),
  defendant_sex character(1),
  defendant_dob date,
  vehicle_licenseplate character varying(10),
  vehicle_licenseplate_state character(2),
  vehicle_registration_expiration_date date,
  vehicle_year integer,
  vehicle_make character varying(20),
  vehicle_model character varying(20),
  vehicle_color character varying,
  offense_timestamp timestamp without time zone,
  offense_street_number character varying(10),
  offense_street_name character varying(30),
  offense_crossstreet_number character varying(10),
  offense_crossstreet_name character varying(30),
  offense_county character varying(10),
  officer_id character varying(20),
  offense_code integer,
  speed_alleged integer,
  speed_limit integer,
  work_zone bit(1),
  school_zone bit(1),
  offense_location point,
  id serial NOT NULL,
  source character varying(20), -- Where this citation came from--court, PD, etc.
  citing_jurisdiction integer,
  the_geom geometry,
  CONSTRAINT masternew_pkey PRIMARY KEY (id ),
  CONSTRAINT citing_jurisdiction FOREIGN KEY (citing_jurisdiction)
      REFERENCES consistent.jurisdictions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT offenses FOREIGN KEY (offense_code)
      REFERENCES consistent.offenses (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 3081)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE consistent.masternew
  OWNER TO postgres;
COMMENT ON COLUMN consistent.masternew.source IS 'Where this citation came from--court, PD, etc.';

CREATE INDEX masternew_citation_id_idx
  ON consistent.masternew
  USING btree
  (citation_id COLLATE pg_catalog."default" );

CREATE INDEX masternew_citing_jurisdiction_idx
  ON consistent.masternew
  USING btree
  (citing_jurisdiction );

CREATE INDEX masternew_defendant_dl_idx
  ON consistent.masternew
  USING btree
  (defendant_dl COLLATE pg_catalog."default" );

CREATE INDEX masternew_id_idx
  ON consistent.masternew
  USING btree
  (id );

CREATE INDEX masternew_offense_street_name_idx
  ON consistent.masternew
  USING btree
  (offense_street_name COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_street_number_idx
  ON consistent.masternew
  USING btree
  (offense_street_number COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_timestamp_idx
  ON consistent.masternew
  USING btree
  (offense_timestamp );

CREATE INDEX masternew_the_geom_idx
  ON consistent.masternew
  USING gist
  (the_geom );

IMPORTANT DISCOVERY 1

I just discovered something interesting. This query:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

The results are:

2085344
2085343
0

How can I possibly explain that? How can the count with WHERE citation_id IS NOT NULL possibly be higher than the same query with no WHERE clause?

IMPORTANT DISCOVERY 2 OK, per the comments below, I discovered that I have a row with all empty values, and this is despite the fact that the table has a serial id column and some NOT NULL constraints.

I deleted the bum row. Now I'm not getting the null error. Instead, I'm getting this:

ERROR:  duplicate key value violates unique constraint "masternew_pkey"
DETAIL:  Key (id)=(1583804) already exists.

********** Error **********

ERROR: duplicate key value violates unique constraint "masternew_pkey"
SQL state: 23505
Detail: Key (id)=(1583804) already exists.

So just to make sure, I do this query:

SELECT COUNT(id)
FROM consistent.master
WHERE id=1583804;

Guess what? consistent.master only has 1 instance of this! So given that the left table in the LEFT JOIN only 1 instance of 1583804 in citation_id and that the id column can only come from the left table, how could this error possibly happen? A LEFT JOIN like this should not cause the final result to have more rows than the left table, right?


回答1:


With an INSERT, especially with a complex one, you should always define the target columns. So make that:

INSERT INTO consistent.masternew (citation_id, col1, col2, ...)

If anything goes wrong in the accompanying SELECT statement - like this:

the_geom geometry

(makes no sense to rename the column with a type name - I assume this is unintended) - or if the underlying table definition changes, an INSERT statement without defined target columns can go terribly wrong.

PostgreSQL does not enforce the same number of columns in the SELECT statement as are in the target table. I quote the fine manual on that:

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none.

(Bold emphasis mine.) If you have a mismatch in the column list, this could make a NULL value appear "out of nowhere".

Also, the order of the columns in the SELECT statement has to match the order of the columns to insert into. If the target columns are not spelled out, this would be the order of the columns in your table as it was created.
You seem to expect that columns are matched by name automatically, but that is not so. Column names in the SELECT statement are completely irrelevant for the final step of the INSERT. Only their order from left to right is significant.

Contrary to what others have implied the WITH clause is perfectly legit. I quote the manual on INSERT:

It is possible for the query (SELECT statement) to also contain a WITH clause. In such a case both sets of with_query can be referenced within the query, but the second one takes precedence since it is more closely nested.

Your statement could look like this:

WITH stops AS (
    SELECT citation_id
          ,rank() OVER (ORDER BY
                    offense_timestamp
                   ,defendant_dl
                   ,offense_street_number
                   ,offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction = 1
    )
INSERT INTO consistent.masternew (citation_id, col1, col2, ...) -- add columns
SELECT m.citation_id -- order colums accordingly!
      ,s.stop
      ,m.defendant_dl
        -- 27 more columns
      ,m.citing_jurisdiction
      ,m.the_geom
FROM   consistent.master m
LEFT   JOIN stops s USING (citation_id);



回答2:


At a guess I'd say that you are inserting stops.stop, which could be null, into the citation_id column, but without knowing the table structure I can't say for sure :)

EDIT: Try @vol7ron's suggestion and name the columns...



来源:https://stackoverflow.com/questions/9713537/postgresql-left-join-creates-blank-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!