What happens with duplicates when inserting multiple rows?

后端未结

关注

 2  1307

I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:

INSERT INTO


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-12-11 06:35
              
            
            
                                                                       
The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.

Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.

Generally, if any constraint is violated, an exception is raised which (unless trapped in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.

Without concurrent writes

I.e.: No other transactions will try to write to the same table at the same time.


Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:


Select rows which are not present in other table

And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ... 


One technique to deal with both at once would be EXCEPT:

INSERT INTO tbl (col1, col2)
VALUES
  (text 'v1', text 'v2')  -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4')  -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;


EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, EXCEPT ALL or one of the other techniques will be faster. Related:


Using EXCEPT clause in PostgreSQL


Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:

INSERT INTO tbl (col1, col2)
SELECT *
FROM  (
   SELECT DISTINCT *
   FROM  (
       VALUES
         (text 'v1', text'v2')
       , ('v3', 'v4')
       , ('v3', 'v4')  -- dupes in source
      ) t(c1, c2)
   ) t
WHERE NOT EXISTS (
   SELECT 1
   FROM   tbl
   WHERE  col1 = t.c1 AND col2 = t.c2
   );


If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.

Related:


Select rows which are not present in other table


With concurrent writes

Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:

INSERT INTO tbl (col1,col2)
SELECT DISTINCT *  -- still can't insert the same row more than once
FROM  (
   VALUES
     (text 'v1', text 'v2')
   , ('v3','v4')
   , ('v3','v4')  -- you still need to fold dupes in source!
  ) t(c1, c2)
ON CONFLICT DO NOTHING;  -- ignores rows with *any* conflict!


More elaborate related answer:


How to use RETURNING with ON CONFLICT in PostgreSQL?
How do I insert a row which contains a foreign key?


Documentation:


The manual
The commit page
The Postgres Wiki page


Craig's reference answer for UPSERT problems:


How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2020-12-11 06:37
              
            
            
                                                                       
Will it stop the entire query and throw an exception? Yes.

To avoid that, you can look on the following SO question here, which describes how to avoid Postgres from throwing an error for multiple inserts when some of the inserted keys already exist on the DB. 

You should basically do this:

INSERT INTO DBtable
        (id, field1)
    SELECT 1, 'value'
    WHERE
        NOT EXISTS (
            SELECT id FROM DBtable WHERE id = 1
);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复