问题
After reading this article I decided to take a closer look at the way I was using Dapper.
I ran this code on an empty database
var members = new List<Member>();
for (int i = 0; i < 50000; i++)
{
members.Add(new Member()
{
Username = i.toString(),
IsActive = true
});
}
using (var scope = new TransactionScope())
{
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);
scope.Complete();
}
it took about 20 seconds. That's 2500 inserts/second. Not bad, but not great either considering the blog was achieving 45k inserts/second. Is there a more efficient way to do this in Dapper?
Also, as a side note, running this code through the Visual Studio debugger took over 3 minutes! I figured the debugger would slow it down a little, but I was really surprised to see that much.
UPDATE
So this
using (var scope = new TransactionScope())
{
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);
scope.Complete();
}
and this
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);
both took 20 seconds.
But this took 4 seconds!
SqlTransaction trans = connection.BeginTransaction();
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);
trans.Commit();
回答1:
The best I was able to achieve was 50k records in 4 seconds using this approach
SqlTransaction trans = connection.BeginTransaction();
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);
trans.Commit();
回答2:
I stumbled accross this recently and noticed that the TransactionScope is created after the connection is opened (I assume this since Dappers Execute doesn't open the connection, unlike Query). According to the answer Q4 here: https://stackoverflow.com/a/2886326/455904 that will not result in the connection to be handled by the TransactionScope. My workmate did some quick tests, and opening the connection outside the TransactionScope drastically decreased performance.
So changing to the following should work:
// Assuming the connection isn't already open
using (var scope = new TransactionScope())
{
connection.Open();
connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);
scope.Complete();
}
回答3:
I created an extension method that would allow you to do a bulk insert very quickly.
public static class DapperExtensions
{
public static async Task BulkInsert<T>(
this IDbConnection connection,
string tableName,
IReadOnlyCollection<T> items,
Dictionary<string, Func<T, object>> dataFunc)
{
const int MaxBatchSize = 1000;
const int MaxParameterSize = 2000;
var batchSize = Math.Min((int)Math.Ceiling((double)MaxParameterSize / dataFunc.Keys.Count), MaxBatchSize);
var numberOfBatches = (int)Math.Ceiling((double)items.Count / batchSize);
var columnNames = dataFunc.Keys;
var insertSql = $"INSERT INTO {tableName} ({string.Join(", ", columnNames.Select(e => $"[{e}]"))}) VALUES ";
var sqlToExecute = new List<Tuple<string, DynamicParameters>>();
for (var i = 0; i < numberOfBatches; i++)
{
var dataToInsert = items.Skip(i * batchSize)
.Take(batchSize);
var valueSql = GetQueries(dataToInsert, dataFunc);
sqlToExecute.Add(Tuple.Create($"{insertSql}{string.Join(", ", valueSql.Item1)}", valueSql.Item2));
}
foreach (var sql in sqlToExecute)
{
await connection.ExecuteAsync(sql.Item1, sql.Item2, commandTimeout: int.MaxValue);
}
}
private static Tuple<IEnumerable<string>, DynamicParameters> GetQueries<T>(
IEnumerable<T> dataToInsert,
Dictionary<string, Func<T, object>> dataFunc)
{
var parameters = new DynamicParameters();
return Tuple.Create(
dataToInsert.Select(e => $"({string.Join(", ", GenerateQueryAndParameters(e, parameters, dataFunc))})"),
parameters);
}
private static IEnumerable<string> GenerateQueryAndParameters<T>(
T entity,
DynamicParameters parameters,
Dictionary<string, Func<T, object>> dataFunc)
{
var paramTemplateFunc = new Func<Guid, string>(guid => $"@p{guid.ToString().Replace("-", "")}");
var paramList = new List<string>();
foreach (var key in dataFunc)
{
var paramName = paramTemplateFunc(Guid.NewGuid());
parameters.Add(paramName, key.Value(entity));
paramList.Add(paramName);
}
return paramList;
}
}
Then to use this extension method, you would write code like the following:
await dbConnection.BulkInsert(
"MySchemaName.MyTableName",
myCollectionOfItems,
new Dictionary<string, Func<MyObjectToInsert, object>>
{
{ "ColumnOne", u => u.ColumnOne },
{ "ColumnTwo", u => u.ColumnTwo },
...
});
This is quite primitive and has further room for improvement, such as passing in a transaction or a commandTimeout value but it does the trick for me.
回答4:
I found all these examples incomplete.
Here is some code that properly closes the connection after use, and also correctly uses the transactionscope to enhance the Excecute performance, based on the more recent and better answers in this thread.
using (var scope = new TransactionScope())
{
Connection.Open();
Connection.Execute(sqlQuery, parameters);
scope.Complete();
}
回答5:
Using the Execute
method with only one insert statement will never do a bulk insert or be efficient. Even the accepted answer with a Transaction
doesn't do a Bulk Insert
.
If you want to perform a Bulk Insert
, use the SqlBulkCopy
https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy
You will not find anything faster than this.
Dapper Plus
Disclaimer: I'm the owner of the project Dapper Plus
This project is not free but offers all bulk operations:
- BulkInsert
- BulkUpdate
- BulkDelete
- BulkMerge
(Use under the hood SqlBulkCopy
)
And some more options such as outputting identity values:
// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
.Table("Orders")
.Identity(x => x.ID);
// CHAIN & SAVE entity
connection.BulkInsert(orders)
.AlsoInsert(order => order.Items);
.Include(x => x.ThenMerge(order => order.Invoice)
.AlsoMerge(invoice => invoice.Items))
.AlsoMerge(x => x.ShippingAddress);
Our library supports multiple providers:
- SQL Server
- SQL Compact
- Oracle
- MySql
- PostgreSQL
- SQLite
- Firebird
回答6:
the fastest variant for me:
var dynamicParameters = new DynamicParameters();
var selects = new List<string>();
for (var i = 0; i < members.Length; i++)
{
var member = members[i];
var pUsername = $"u{i}";
var pIsActive = $"a{i}";
dynamicParameters.Add(pUsername, member.Username);
dynamicParameters.Add(pIsActive, member.IsActive);
selects.Add("select @{pUsername},@{pIsActive}");
}
con.Execute($"insert into Member(Username, IsActive){string.Join(" union all ", selects)}", dynamicParameters);
which generate sql like:
INSERT TABLENAME (Column1,Column2,...)
SELECT @u0,@a0...
UNION ALL
SELECT @u1,@a1...
UNION ALL
SELECT @u2,@a2...
this query works faster because sql adds set of rows instead adding 1 row at a time. The bottleneck is not writing the data, it's writing what you're doing in the log.
Also, look into the rules of minimally logged transactions.
来源:https://stackoverflow.com/questions/10689779/bulk-inserts-taking-longer-than-expected-using-dapper