问题
When using LOAD DATA INFILE, is there a way to either flag a duplicate row, or dump any/all duplicates into a separate table?
回答1:
From the LOAD DATE INFILE documentation:
The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values:
- If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 12.2.7, “REPLACE Syntax”.
- If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL, the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
Effectively, there's no way to redirect the duplicate records to a different table. You'd have to load them all in, and then create another table to hold the non-duplicated records.
回答2:
It looks as if there actually is something you can do when it comes to duplicate rows for LOAD DATA calls. However, the approach that I've found isn't perfect: it acts more as a log for all deletes on a table, instead of just for LOAD DATA calls. Here's my approach:
Table test:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
text VARCHAR(255) DEFAULT NULL
);
Table test_log:
CREATE TABLE test_log (
id INTEGER, -- not primary key, we want to accept duplicate rows
text VARCHAR(255) DEFAULT NULL,
time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Trigger del_chk:
delimiter //
drop trigger if exists del_chk;
CREATE TRIGGER del_chk AFTER DELETE ON test
FOR EACH ROW
BEGIN
INSERT INTO test_log(id,text) values(OLD.id,OLD.text);
END;//
delimiter ;
Test import (/home/user/test.csv
):
1,asdf
2,jkl
3,qwer
1,tyui
1,zxcv
2,bnm
Query:
LOAD DATA INFILE '/home/ken/test.csv'
REPLACE INTO TABLE test
FIELDS
TERMINATED BY ','
LINES
TERMINATED BY '\n' (id,text);
Running the above query will result in 1,asdf
, 1,tyui
, and 2,jkl
being added to the log table. Based on a timestamp, it could be possible to associate the rows with a particular LOAD DATA
statement.
来源:https://stackoverflow.com/questions/1965001/mysql-duplicates-with-load-data-infile