How do I perform a batch insert in Django?

前端 未结 5 1665
隐瞒了意图╮
隐瞒了意图╮ 2020-12-23 20:12

In mysql, you can insert multiple rows to a table in one query for n > 0:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9), ..., (n-2, n-1, n);


        
相关标签:
5条回答
  • 2020-12-23 20:49

    Here is way to do batch inserts that still goes through Django's ORM (and thus retains the many benefits the ORM provides). This approach involves subclassing the InsertQuery class as well as creating a custom manager that prepares model instances for insertion into the database in much the same way that Django's save() method uses. Most of the code for the BatchInsertQuery class below is straight from the InsertQuery class, with just a few key lines added or modified. To use the batch_insert method, pass in a set of model instances that you want to insert into the database. This approach frees up the code in your views from having to worry about translating model instances into valid SQL values; the manager class in conjunction with the BatchInsertQuery class handles that.

    from django.db import models, connection
    from django.db.models.sql import InsertQuery
    
    class BatchInsertQuery( InsertQuery ):
    
        ####################################################################
    
        def as_sql(self):
            """
            Constructs a SQL statement for inserting all of the model instances
            into the database.
    
            Differences from base class method:        
    
            - The VALUES clause is constructed differently to account for the
            grouping of the values (actually, placeholders) into
            parenthetically-enclosed groups. I.e., VALUES (a,b,c),(d,e,f)
            """
            qn = self.connection.ops.quote_name
            opts = self.model._meta
            result = ['INSERT INTO %s' % qn(opts.db_table)]
            result.append('(%s)' % ', '.join([qn(c) for c in self.columns]))
            result.append( 'VALUES %s' % ', '.join( '(%s)' % ', '.join( 
                values_group ) for values_group in self.values ) ) # This line is different
            params = self.params
            if self.return_id and self.connection.features.can_return_id_from_insert:
                col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
                r_fmt, r_params = self.connection.ops.return_insert_id()
                result.append(r_fmt % col)
                params = params + r_params
            return ' '.join(result), params
    
        ####################################################################
    
        def insert_values( self, insert_values ):
            """
            Adds the insert values to the instance. Can be called multiple times
            for multiple instances of the same model class.
    
            Differences from base class method:
    
            -Clears self.columns so that self.columns won't be duplicated for each
            set of inserted_values.        
            -appends the insert_values to self.values instead of extends so that
            the values (actually the placeholders) remain grouped separately for
            the VALUES clause of the SQL statement. I.e., VALUES (a,b,c),(d,e,f)
            -Removes inapplicable code
            """
            self.columns = [] # This line is new
    
            placeholders, values = [], []
            for field, val in insert_values:
                placeholders.append('%s')
    
                self.columns.append(field.column)
                values.append(val)
    
            self.params += tuple(values)
            self.values.append( placeholders ) # This line is different
    
    ########################################################################
    
    class ManagerEx( models.Manager ):
        """
        Extended model manager class.
        """
        def batch_insert( self, *instances ):
            """
            Issues a batch INSERT using the specified model instances.
            """
            cls = instances[0].__class__
            query = BatchInsertQuery( cls, connection )
            for instance in instances:
    
                 values = [ (f, f.get_db_prep_save( f.pre_save( instance, True ) ) ) \
                     for f in cls._meta.local_fields ]
                query.insert_values( values )
    
            return query.execute_sql()
    
    ########################################################################
    
    class MyModel( models.Model ):
        myfield = models.CharField(max_length=255)
        objects = ManagerEx()
    
    ########################################################################
    
    # USAGE:
    object1 = MyModel(myfield="foo")
    object2 = MyModel(myfield="bar") 
    object3 = MyModel(myfield="bam")
    MyModels.objects.batch_insert(object1,object2,object3)
    
    0 讨论(0)
  • 2020-12-23 20:51

    You might get the performance you need by doing manual transactions. What this will allow you to do is to create all the inserts in one transaction, then commit the transaction all at once. Hopefully this will help you: http://docs.djangoproject.com/en/dev/topics/db/transactions/

    0 讨论(0)
  • 2020-12-23 20:56

    I recently looked for such a thing myself (inspired by QuerySet.update(), as I imagine you are too). To my knowledge, no bulk create exists in the current production framework (1.1.1 as of today). We ended up creating a custom manager for the model that needed bulk-create, and created a function on that manager to build an appropriate SQL statement with the sequence of VALUES parameters.

    Something like (apologies if this does not work... hopefully I've adapted this runnably from our code):

    from django.db import models, connection
    
    class MyManager(models.Manager):
    
        def create_in_bulk(self, values):
            base_sql = "INSERT INTO tbl_name (a,b,c) VALUES "
            values_sql = []
            values_data = []
    
            for value_list in values:
                placeholders = ['%s' for i in range(len(value_list))]
                values_sql.append("(%s)" % ','.join(placeholders))
                values_data.extend(value_list)
    
            sql = '%s%s' % (base_sql, ', '.join(values_sql))
    
            curs = connection.cursor()
            curs.execute(sql, values_data)
    
    class MyObject(models.Model):
        # model definition as usual... assume:
        foo = models.CharField(max_length=128)
    
        # custom manager
        objects = MyManager()
    
    MyObject.objects.create_in_bulk( [('hello',), ('bye',), ('c', )] )
    

    This approach does run the risk of being very specific to a particular database. In our case, we wanted the function to return the IDs just created, so we had a postgres-specific query in the function to generate the requisite number of IDs from the primary key sequence for the table that represents the object. That said, it does perform significantly better in tests versus iterating over the data and issuing separate QuerySet.create() statements.

    0 讨论(0)
  • 2020-12-23 21:09

    No it is not possible because django models are objects rather than a table. so table actions are not applicable to django models. and django creates an object then inserts data in to the table therefore you can't create multiple object in one time.

    0 讨论(0)
  • 2020-12-23 21:10

    These answers are outdated. bulk_create has been brought in Django 1.4:

    https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

    0 讨论(0)
提交回复
热议问题