Bulk inserts taking longer than expected using Dapper

后端 未结 6 614
难免孤独
难免孤独 2020-11-29 18:40

After reading this article I decided to take a closer look at the way I was using Dapper.

I ran this code on an empty database

var members = new List         


        
相关标签:
6条回答
  • 2020-11-29 18:45

    Using the Execute method with only one insert statement will never do a bulk insert or be efficient. Even the accepted answer with a Transaction doesn't do a Bulk Insert.

    If you want to perform a Bulk Insert, use the SqlBulkCopy https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy

    You will not find anything faster than this.

    Dapper Plus

    Disclaimer: I'm the owner of the project Dapper Plus

    This project is not free but offers all bulk operations:

    • BulkInsert
    • BulkUpdate
    • BulkDelete
    • BulkMerge

    (Use under the hood SqlBulkCopy)

    And some more options such as outputting identity values:

    // CONFIGURE & MAP entity
    DapperPlusManager.Entity<Order>()
                     .Table("Orders")
                     .Identity(x => x.ID);
    
    // CHAIN & SAVE entity
    connection.BulkInsert(orders)
              .AlsoInsert(order => order.Items);
              .Include(x => x.ThenMerge(order => order.Invoice)
                             .AlsoMerge(invoice => invoice.Items))
              .AlsoMerge(x => x.ShippingAddress);   
    

    Our library supports multiple providers:

    • SQL Server
    • SQL Compact
    • Oracle
    • MySql
    • PostgreSQL
    • SQLite
    • Firebird
    0 讨论(0)
  • 2020-11-29 18:46

    The best I was able to achieve was 50k records in 4 seconds using this approach

    SqlTransaction trans = connection.BeginTransaction();
    
    connection.Execute(@"
    insert Member(Username, IsActive)
    values(@Username, @IsActive)", members, transaction: trans);
    
    trans.Commit();
    
    0 讨论(0)
  • 2020-11-29 18:46

    I found all these examples incomplete.

    Here is some code that properly closes the connection after use, and also correctly uses the transactionscope to enhance the Excecute performance, based on the more recent and better answers in this thread.

    using (var scope = new TransactionScope()) 
    {
        Connection.Open();
        Connection.Execute(sqlQuery, parameters);
    
        scope.Complete();
    }
    
    0 讨论(0)
  • 2020-11-29 18:47

    I created an extension method that would allow you to do a bulk insert very quickly.

    public static class DapperExtensions
    {
        public static async Task BulkInsert<T>(
            this IDbConnection connection,
            string tableName,
            IReadOnlyCollection<T> items,
            Dictionary<string, Func<T, object>> dataFunc)
        {
            const int MaxBatchSize = 1000;
            const int MaxParameterSize = 2000;
    
            var batchSize = Math.Min((int)Math.Ceiling((double)MaxParameterSize / dataFunc.Keys.Count), MaxBatchSize);
            var numberOfBatches = (int)Math.Ceiling((double)items.Count / batchSize);
            var columnNames = dataFunc.Keys;
            var insertSql = $"INSERT INTO {tableName} ({string.Join(", ", columnNames.Select(e => $"[{e}]"))}) VALUES ";
            var sqlToExecute = new List<Tuple<string, DynamicParameters>>();
    
            for (var i = 0; i < numberOfBatches; i++)
            {
                var dataToInsert = items.Skip(i * batchSize)
                    .Take(batchSize);
                var valueSql = GetQueries(dataToInsert, dataFunc);
    
                sqlToExecute.Add(Tuple.Create($"{insertSql}{string.Join(", ", valueSql.Item1)}", valueSql.Item2));
            }
    
            foreach (var sql in sqlToExecute)
            {
                await connection.ExecuteAsync(sql.Item1, sql.Item2, commandTimeout: int.MaxValue);
            }
        }
    
        private static Tuple<IEnumerable<string>, DynamicParameters> GetQueries<T>(
            IEnumerable<T> dataToInsert,
            Dictionary<string, Func<T, object>> dataFunc)
        {
            var parameters = new DynamicParameters();
    
            return Tuple.Create(
                dataToInsert.Select(e => $"({string.Join(", ", GenerateQueryAndParameters(e, parameters, dataFunc))})"),
                parameters);
        }
    
        private static IEnumerable<string> GenerateQueryAndParameters<T>(
            T entity,
            DynamicParameters parameters,
            Dictionary<string, Func<T, object>> dataFunc)
        {
            var paramTemplateFunc = new Func<Guid, string>(guid => $"@p{guid.ToString().Replace("-", "")}");
            var paramList = new List<string>();
    
            foreach (var key in dataFunc)
            {
                var paramName = paramTemplateFunc(Guid.NewGuid());
                parameters.Add(paramName, key.Value(entity));
                paramList.Add(paramName);
            }
    
            return paramList;
        }
    }
    

    Then to use this extension method, you would write code like the following:

    await dbConnection.BulkInsert(
        "MySchemaName.MyTableName",
        myCollectionOfItems,
        new Dictionary<string, Func<MyObjectToInsert, object>>
            {
                { "ColumnOne", u => u.ColumnOne },
                { "ColumnTwo", u => u.ColumnTwo },
                ...
            });
    

    This is quite primitive and has further room for improvement, such as passing in a transaction or a commandTimeout value but it does the trick for me.

    0 讨论(0)
  • 2020-11-29 18:48

    I stumbled accross this recently and noticed that the TransactionScope is created after the connection is opened (I assume this since Dappers Execute doesn't open the connection, unlike Query). According to the answer Q4 here: https://stackoverflow.com/a/2886326/455904 that will not result in the connection to be handled by the TransactionScope. My workmate did some quick tests, and opening the connection outside the TransactionScope drastically decreased performance.

    So changing to the following should work:

    // Assuming the connection isn't already open
    using (var scope = new TransactionScope())
    {
        connection.Open();
        connection.Execute(@"
    insert Member(Username, IsActive)
    values(@Username, @IsActive)", members);
    
        scope.Complete();
    }
    
    0 讨论(0)
  • 2020-11-29 19:00

    the fastest variant for me:

                var dynamicParameters = new DynamicParameters();
                var selects = new List<string>();
                for (var i = 0; i < members.Length; i++)
                {
                    var member = members[i];
                    var pUsername = $"u{i}";
                    var pIsActive = $"a{i}";
                    dynamicParameters.Add(pUsername, member.Username);
                    dynamicParameters.Add(pIsActive, member.IsActive);
                    selects.Add("select @{pUsername},@{pIsActive}");
                }
                con.Execute($"insert into Member(Username, IsActive){string.Join(" union all ", selects)}", dynamicParameters);
    

    which generate sql like:

    INSERT TABLENAME (Column1,Column2,...)
     SELECT @u0,@a0...
     UNION ALL
     SELECT @u1,@a1...
     UNION ALL
     SELECT @u2,@a2...
    

    this query works faster because sql adds set of rows instead adding 1 row at a time. The bottleneck is not writing the data, it's writing what you're doing in the log.

    Also, look into the rules of minimally logged transactions.

    0 讨论(0)
提交回复
热议问题