问题
I have a database (Postgres 9.3.5) of 80 millions records, the insert
query below fails with:
ERROR: invalid input syntax for integer: ""
INSERT INTO DISCOGS.TRACK_DURATION
SELECT
track_id,
duration,
hours_as_seconds + minutes_as_seconds + seconds as total_seconds
FROM (
select
track_id,
duration,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 1))::bigint * 60 * 60
ELSE 0
END as hours_as_seconds,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 2))::bigint * 60
WHEN duration like '%:%' THEN (split_part(duration, ':', 1))::bigint * 60
ELSE 0
END as minutes_as_seconds,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 3))::bigint
WHEN duration like '%:%' THEN (split_part(duration, ':', 2))::bigint
ELSE 0
END as seconds
from discogs.track t1
where release_id < 10000000
and t1.duration!='' and t1.duration is not null
and t1.position!=''
) as s1
I can use the where release_id
to limit the number of records checked and with lower values its fine, so its bad data , but with so many records how do I find the problem data. Note I'm already filtering out values where duration is empty string and I also found a few records with bad data (such as %%%%) that I have changed but it is still failing.
回答1:
I would search for malformed durations using a regular expression, as in:
create table duration (
d varchar(20)
);
insert into duration (d) values ('12:34:56');
insert into duration (d) values ('34:56');
insert into duration (d) values ('15::'); -- bad one
insert into duration (d) values (':34:56'); -- bad one
insert into duration (d) values (':34:'); -- bad one
insert into duration (d) values ('12:34:'); -- bad one
insert into duration (d) values ('34:'); -- bad one
insert into duration (d) values (':56'); -- bad one
select *
from duration
where d not similar to '([0-9]+:)?[0-9]+:[0-9]+'
Result:
d
------
15::
:34:56
:34:
12:34:
34:
:56
In your case the query should look like:
select track_id, duration
from discogs.track
where duration not similar to '([0-9]+:)?[0-9]+:[0-9]+';
来源:https://stackoverflow.com/questions/51897967/how-to-find-out-bad-data-causing-this-insert-to-fail