I\'m looking for the fastest way of inserting into Entity Framework.
I\'m asking this because of the scenario where you have an active TransactionScope
a
SqlBulkCopy is super quick
This is my implementation:
// at some point in my calling code, I will call:
var myDataTable = CreateMyDataTable();
myDataTable.Rows.Add(Guid.NewGuid,tableHeaderId,theName,theValue); // e.g. - need this call for each row to insert
var efConnectionString = ConfigurationManager.ConnectionStrings["MyWebConfigEfConnection"].ConnectionString;
var efConnectionStringBuilder = new EntityConnectionStringBuilder(efConnectionString);
var connectionString = efConnectionStringBuilder.ProviderConnectionString;
BulkInsert(connectionString, myDataTable);
private DataTable CreateMyDataTable()
{
var myDataTable = new DataTable { TableName = "MyTable"};
// this table has an identity column - don't need to specify that
myDataTable.Columns.Add("MyTableRecordGuid", typeof(Guid));
myDataTable.Columns.Add("MyTableHeaderId", typeof(int));
myDataTable.Columns.Add("ColumnName", typeof(string));
myDataTable.Columns.Add("ColumnValue", typeof(string));
return myDataTable;
}
private void BulkInsert(string connectionString, DataTable dataTable)
{
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
SqlTransaction transaction = null;
try
{
transaction = connection.BeginTransaction();
using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
{
sqlBulkCopy.DestinationTableName = dataTable.TableName;
foreach (DataColumn column in dataTable.Columns) {
sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
}
sqlBulkCopy.WriteToServer(dataTable);
}
transaction.Commit();
}
catch (Exception)
{
transaction?.Rollback();
throw;
}
}
}
Use stored procedure that takes input data in form of xml to insert data.
From your c# code pass insert data as xml.
e.g in c#, syntax would be like this:
object id_application = db.ExecuteScalar("procSaveApplication", xml)
I know this is a very old question, but one guy here said that developed an extension method to use bulk insert with EF, and when I checked, I discovered that the library costs $599 today (for one developer). Maybe it makes sense for the entire library, however for just the bulk insert this is too much.
Here is a very simple extension method I made. I use that on pair with database first (do not tested with code first, but I think that works the same). Change YourEntities
with the name of your context:
public partial class YourEntities : DbContext
{
public async Task BulkInsertAllAsync<T>(IEnumerable<T> entities)
{
using (var conn = new SqlConnection(Database.Connection.ConnectionString))
{
await conn.OpenAsync();
Type t = typeof(T);
var bulkCopy = new SqlBulkCopy(conn)
{
DestinationTableName = GetTableName(t)
};
var table = new DataTable();
var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));
foreach (var property in properties)
{
Type propertyType = property.PropertyType;
if (propertyType.IsGenericType &&
propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
{
propertyType = Nullable.GetUnderlyingType(propertyType);
}
table.Columns.Add(new DataColumn(property.Name, propertyType));
}
foreach (var entity in entities)
{
table.Rows.Add(
properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
}
bulkCopy.BulkCopyTimeout = 0;
await bulkCopy.WriteToServerAsync(table);
}
}
public void BulkInsertAll<T>(IEnumerable<T> entities)
{
using (var conn = new SqlConnection(Database.Connection.ConnectionString))
{
conn.Open();
Type t = typeof(T);
var bulkCopy = new SqlBulkCopy(conn)
{
DestinationTableName = GetTableName(t)
};
var table = new DataTable();
var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));
foreach (var property in properties)
{
Type propertyType = property.PropertyType;
if (propertyType.IsGenericType &&
propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
{
propertyType = Nullable.GetUnderlyingType(propertyType);
}
table.Columns.Add(new DataColumn(property.Name, propertyType));
}
foreach (var entity in entities)
{
table.Rows.Add(
properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
}
bulkCopy.BulkCopyTimeout = 0;
bulkCopy.WriteToServer(table);
}
}
public string GetTableName(Type type)
{
var metadata = ((IObjectContextAdapter)this).ObjectContext.MetadataWorkspace;
var objectItemCollection = ((ObjectItemCollection)metadata.GetItemCollection(DataSpace.OSpace));
var entityType = metadata
.GetItems<EntityType>(DataSpace.OSpace)
.Single(e => objectItemCollection.GetClrType(e) == type);
var entitySet = metadata
.GetItems<EntityContainer>(DataSpace.CSpace)
.Single()
.EntitySets
.Single(s => s.ElementType.Name == entityType.Name);
var mapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace)
.Single()
.EntitySetMappings
.Single(s => s.EntitySet == entitySet);
var table = mapping
.EntityTypeMappings.Single()
.Fragments.Single()
.StoreEntitySet;
return (string)table.MetadataProperties["Table"].Value ?? table.Name;
}
}
You can use that against any collection that inherit from IEnumerable
, like that:
await context.BulkInsertAllAsync(items);
This combination increase speed well enough.
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
You should look at using the System.Data.SqlClient.SqlBulkCopy
for this. Here's the documentation, and of course there are plenty of tutorials online.
Sorry, I know you were looking for a simple answer to get EF to do what you want, but bulk operations are not really what ORMs are meant for.
I've investigated Slauma's answer (which is awesome, thanks for the idea man), and I've reduced batch size until I've hit optimal speed. Looking at the Slauma's results:
It is visible that there is speed increase when moving from 1 to 10, and from 10 to 100, but from 100 to 1000 inserting speed is falling down again.
So I've focused on what's happening when you reduce batch size to value somewhere in between 10 and 100, and here are my results (I'm using different row contents, so my times are of different value):
Quantity | Batch size | Interval
1000 1 3
10000 1 34
100000 1 368
1000 5 1
10000 5 12
100000 5 133
1000 10 1
10000 10 11
100000 10 101
1000 20 1
10000 20 9
100000 20 92
1000 27 0
10000 27 9
100000 27 92
1000 30 0
10000 30 9
100000 30 92
1000 35 1
10000 35 9
100000 35 94
1000 50 1
10000 50 10
100000 50 106
1000 100 1
10000 100 14
100000 100 141
Based on my results, actual optimum is around value of 30 for batch size. It's less than both 10 and 100. Problem is, I have no idea why is 30 optimal, nor could have I found any logical explanation for it.