psycopg2: insert multiple rows with one query

后端未结

关注

 15  2311

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:

INSERT INTO t (a, b) VALUES (1, 2),


                      
              相关标签:


      
      
        
          15条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-11-22 09:39
              
            
            
                                                                       
Another nice and efficient approach - is to pass rows for insertion as 1 argument, 
which is array of json objects.

E.g. you passing argument:

[ {id: 18, score: 1}, { id: 19, score: 5} ]


It is array, which may contain any amount of objects inside.
Then your SQL looks like:

INSERT INTO links (parent_id, child_id, score) 
SELECT 123, (r->>'id')::int, (r->>'score')::int 
FROM unnest($1::json[]) as r 


Notice: Your postgress must be new enough, to support json
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-11-22 09:40
              
            
            
                                                                       
I built a program that inserts multiple lines to a server that was located in another city.  

I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str) 


and 2 minutes when using this method:

cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2020-11-22 09:42
              
            
            
                                                                       
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.

Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.

args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup) 
cur.execute(b"INSERT INTO table VALUES " + args_str)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-11-22 09:42
              
            
            
                                                                       
The cursor.copyfrom solution as provided by @jopseph.sheedy  (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast. 

However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly. 

The IteratorFile needs to be instantiated with tab-separated fields like this (r is a list of dicts where each dict is a record):    

    f = IteratorFile("{0}\t{1}\t{2}\t{3}\t{4}".format(r["id"],
        r["type"],
        r["item"],
        r["month"],
        r["revenue"]) for r in records)


To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders : "{}\t{}\t{}....\t{}" and then use .format() to fill in the field values :  *list(r.values())) for r in records:

        line = "\t".join(["{}"] * len(records[0]))

        f = IteratorFile(line.format(*list(r.values())) for r in records)


complete function in gist here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-11-22 09:48
              
            
            
                                                                       
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be

args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))


For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-11-22 09:48
              
            
            
                                                                       
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).

Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:

valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
        # row == [1, 'a', 'yolo', ... ]
        sqlrows += row
        if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
                # sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
                insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
                cur.execute(insertSQL, sqlrows)
                con.commit()
                sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()


But it should be noted that if you can use copy_from(), you should use copy_from ;)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
3
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复