问题
I'm trying to filter rows from the MySQL table where all the $_POST
data is stored from an online form. Sometimes the user's internet connection stalls or the browser screws up, and the new page after form submission is not displayed (though the INSERT worked and the table row was created). They then hit refresh, and submit their form twice, creating a duplicate row (except for the timestamp and autoincrement id columns).
I'd like to select unique form submissions. This has to be a really common task, but I can't seem to find something that lets me call with DISTINCT
applying to every column except the timestamp and id in a succinct way (sort of like SELECT id, timestamp, DISTINCT everything_else FROM table;
. At the moment, I can do:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS (
SELECT DISTINCT everything,except,id,and,timestamp
FROM table1
);
SELECT * FROM table1 LEFT OUTER JOIN temp1
ON table1.everything = temp1.everything
...
;
My table has 20k rows with about 25 columns (classification features for a machine learning exercise). This query takes forever (as I presume it traverses the 20k rows 20K times?) I've never even let it run to completion. What's the standard practice way to do this?
Note: This question suggests add an index to the relevant columns, but there can be max 16 key parts to an index. Should I just choose the most likely unique ones? I can find about 700 duplicates in 2 seconds this way, but I can't be sure of not throwing away a unique row because I also have to ignore some columns when specifying the index.
回答1:
If you have a UNIQUE
key (other than an AUTO_INCREMENT
), simply use INSERT IGNORE ...
to silently avoid duplicate rows. If you don't have a UNIQUE key, do you never need to find a row again?
If you have already allowed duplicates and you need to get rid of them, that is a different question.
回答2:
I would try to eliminate the problem in the first place. There are techniques to eliminate this issue. The first one on my mind is that you could generate a random string and store it in both the session and as a hidden field in the form. This random string should be generated each time the form is displayed. When the user submits the form you need to check that the session key and the input key matches. Make sure to generate a different key on each request. Thus when a user refreshes the page he will submit an old key and it will not match.
Another solution could be that if this data should always be unique in the database check if there is that exact data in the database first before inserting. And if the data is unique by lets say the email address you can create a unique key index. Therefore that field will have to be unique in the table.
来源:https://stackoverflow.com/questions/29334624/mysql-select-distinct-rows-not-columns-to-filter-post-for-duplicates