Fastest Way of Inserting in Entity Framework

前端 未结 30 2102
鱼传尺愫
鱼传尺愫 2020-11-21 05:23

I\'m looking for the fastest way of inserting into Entity Framework.

I\'m asking this because of the scenario where you have an active TransactionScope a

相关标签:
30条回答
  • 2020-11-21 05:43

    To your remark in the comments to your question:

    "...SavingChanges (for each record)..."

    That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

    • Call SaveChanges() once after ALL records.
    • Call SaveChanges() after for example 100 records.
    • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
    • Disable change detection

    For bulk inserts I am working and experimenting with a pattern like this:

    using (TransactionScope scope = new TransactionScope())
    {
        MyDbContext context = null;
        try
        {
            context = new MyDbContext();
            context.Configuration.AutoDetectChangesEnabled = false;
    
            int count = 0;            
            foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
            {
                ++count;
                context = AddToContext(context, entityToInsert, count, 100, true);
            }
    
            context.SaveChanges();
        }
        finally
        {
            if (context != null)
                context.Dispose();
        }
    
        scope.Complete();
    }
    
    private MyDbContext AddToContext(MyDbContext context,
        Entity entity, int count, int commitCount, bool recreateContext)
    {
        context.Set<Entity>().Add(entity);
    
        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                context.Dispose();
                context = new MyDbContext();
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }
    
        return context;
    }
    

    I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

    For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

    Here are a few measurements for my 560000 entities:

    • commitCount = 1, recreateContext = false: many hours (That's your current procedure)
    • commitCount = 100, recreateContext = false: more than 20 minutes
    • commitCount = 1000, recreateContext = false: 242 sec
    • commitCount = 10000, recreateContext = false: 202 sec
    • commitCount = 100000, recreateContext = false: 199 sec
    • commitCount = 1000000, recreateContext = false: out of memory exception
    • commitCount = 1, recreateContext = true: more than 10 minutes
    • commitCount = 10, recreateContext = true: 241 sec
    • commitCount = 100, recreateContext = true: 164 sec
    • commitCount = 1000, recreateContext = true: 191 sec

    The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.

    0 讨论(0)
  • 2020-11-21 05:43

    Another option is to use SqlBulkTools available from Nuget. It's very easy to use and has some powerful features.

    Example:

    var bulk = new BulkOperations();
    var books = GetBooks();
    
    using (TransactionScope trans = new TransactionScope())
    {
        using (SqlConnection conn = new SqlConnection(ConfigurationManager
        .ConnectionStrings["SqlBulkToolsTest"].ConnectionString))
        {
            bulk.Setup<Book>()
                .ForCollection(books)
                .WithTable("Books") 
                .AddAllColumns()
                .BulkInsert()
                .Commit(conn);
        }
    
        trans.Complete();
    }
    

    See the documentation for more examples and advanced usage. Disclaimer: I am the author of this library and any views are of my own opinion.

    0 讨论(0)
  • 2020-11-21 05:43

    You may use Bulk package library. Bulk Insert 1.0.0 version is used in projects having Entity framework >=6.0.0 .

    More description can be found here- Bulkoperation source code

    0 讨论(0)
  • 2020-11-21 05:47

    TL;DR I know it is an old post, but I have implemented a solution starting from one of those proposed by extending it and solving some problems of this; moreover I have also read the other solutions presented and compared to these it seems to me to propose a solution that is much more suited to the requests formulated in the original question.

    In this solution I extend Slauma's approach which I would say is perfect for the case proposed in the original question, and that is to use Entity Framework and Transaction Scope for an expensive write operation on the db.

    In Slauma's solution - which incidentally was a draft and was only used to get an idea of ​​the speed of EF with a strategy to implement bulk-insert - there were problems due to:

    1. the timeout of the transaction (by default 1 minute extendable via code to max 10 minutes);
    2. the duplication of the first block of data with a width equal to the size of the commit used at the end of the transaction (this problem is quite weird and circumvented by means of a workaround).

    I also extended the case study presented by Slauma by reporting an example that includes the contextual insertion of several dependent entities.

    The performances that I have been able to verify have been of 10K rec/min inserting in the db a block of 200K wide records approximately 1KB each. The speed was constant, there was no degradation in performance and the test took about 20 minutes to run successfully.

    The solution in detail

    the method that presides over the bulk-insert operation inserted in an example repository class:

    abstract class SomeRepository { 
    
        protected MyDbContext myDbContextRef;
    
        public void ImportData<TChild, TFather>(List<TChild> entities, TFather entityFather)
                where TChild : class, IEntityChild
                where TFather : class, IEntityFather
        {
    
            using (var scope = MyDbContext.CreateTransactionScope())
            {
    
                MyDbContext context = null;
                try
                {
                    context = new MyDbContext(myDbContextRef.ConnectionString);
    
                    context.Configuration.AutoDetectChangesEnabled = false;
    
                    entityFather.BulkInsertResult = false;
                    var fileEntity = context.Set<TFather>().Add(entityFather);
                    context.SaveChanges();
    
                    int count = 0;
    
                    //avoids an issue with recreating context: EF duplicates the first commit block of data at the end of transaction!!
                    context = MyDbContext.AddToContext<TChild>(context, null, 0, 1, true);
    
                    foreach (var entityToInsert in entities)
                    {
                        ++count;
                        entityToInsert.EntityFatherRefId = fileEntity.Id;
                        context = MyDbContext.AddToContext<TChild>(context, entityToInsert, count, 100, true);
                    }
    
                    entityFather.BulkInsertResult = true;
                    context.Set<TFather>().Add(fileEntity);
                    context.Entry<TFather>(fileEntity).State = EntityState.Modified;
    
                    context.SaveChanges();
                }
                finally
                {
                    if (context != null)
                        context.Dispose();
                }
    
                scope.Complete();
            }
    
        }
    
    }
    

    interfaces used for example purposes only:

    public interface IEntityChild {
    
        //some properties ...
    
        int EntityFatherRefId { get; set; }
    
    }
    
    public interface IEntityFather {
    
        int Id { get; set; }
        bool BulkInsertResult { get; set; }
    }
    

    db context where I implemented the various elements of the solution as static methods:

    public class MyDbContext : DbContext
    {
    
        public string ConnectionString { get; set; }
    
    
        public MyDbContext(string nameOrConnectionString)
        : base(nameOrConnectionString)
        {
            Database.SetInitializer<MyDbContext>(null);
            ConnectionString = Database.Connection.ConnectionString;
        }
    
    
        /// <summary>
        /// Creates a TransactionScope raising timeout transaction to 30 minutes
        /// </summary>
        /// <param name="_isolationLevel"></param>
        /// <param name="timeout"></param>
        /// <remarks>
        /// It is possible to set isolation-level and timeout to different values. Pay close attention managing these 2 transactions working parameters.
        /// <para>Default TransactionScope values for isolation-level and timeout are the following:</para>
        /// <para>Default isolation-level is "Serializable"</para>
        /// <para>Default timeout ranges between 1 minute (default value if not specified a timeout) to max 10 minute (if not changed by code or updating max-timeout machine.config value)</para>
        /// </remarks>
        public static TransactionScope CreateTransactionScope(IsolationLevel _isolationLevel = IsolationLevel.Serializable, TimeSpan? timeout = null)
        {
            SetTransactionManagerField("_cachedMaxTimeout", true);
            SetTransactionManagerField("_maximumTimeout", timeout ?? TimeSpan.FromMinutes(30));
    
            var transactionOptions = new TransactionOptions();
            transactionOptions.IsolationLevel = _isolationLevel;
            transactionOptions.Timeout = TransactionManager.MaximumTimeout;
            return new TransactionScope(TransactionScopeOption.Required, transactionOptions);
        }
    
        private static void SetTransactionManagerField(string fieldName, object value)
        {
            typeof(TransactionManager).GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, value);
        }
    
    
        /// <summary>
        /// Adds a generic entity to a given context allowing commit on large block of data and improving performance to support db bulk-insert operations based on Entity Framework
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="context"></param>
        /// <param name="entity"></param>
        /// <param name="count"></param>
        /// <param name="commitCount">defines the block of data size</param>
        /// <param name="recreateContext"></param>
        /// <returns></returns>
        public static MyDbContext AddToContext<T>(MyDbContext context, T entity, int count, int commitCount, bool recreateContext) where T : class
        {
            if (entity != null)
                context.Set<T>().Add(entity);
    
            if (count % commitCount == 0)
            {
                context.SaveChanges();
                if (recreateContext)
                {
                    var contextConnectionString = context.ConnectionString;
                    context.Dispose();
                    context = new MyDbContext(contextConnectionString);
                    context.Configuration.AutoDetectChangesEnabled = false;
                }
            }
    
            return context;
        }
    }
    
    0 讨论(0)
  • 2020-11-21 05:48

    [NEW SOLUTION FOR POSTGRESQL] Hey, I know it's quite an old post, but I have recently run into similar problem, but we were using Postgresql. I wanted to use effective bulkinsert, what turned out to be pretty difficult. I haven't found any proper free library to do so on this DB. I have only found this helper: https://bytefish.de/blog/postgresql_bulk_insert/ which is also on Nuget. I have written a small mapper, which auto mapped properties the way Entity Framework:

    public static PostgreSQLCopyHelper<T> CreateHelper<T>(string schemaName, string tableName)
            {
                var helper = new PostgreSQLCopyHelper<T>("dbo", "\"" + tableName + "\"");
                var properties = typeof(T).GetProperties();
                foreach(var prop in properties)
                {
                    var type = prop.PropertyType;
                    if (Attribute.IsDefined(prop, typeof(KeyAttribute)) || Attribute.IsDefined(prop, typeof(ForeignKeyAttribute)))
                        continue;
                    switch (type)
                    {
                        case Type intType when intType == typeof(int) || intType == typeof(int?):
                            {
                                helper = helper.MapInteger("\"" + prop.Name + "\"",  x => (int?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type stringType when stringType == typeof(string):
                            {
                                helper = helper.MapText("\"" + prop.Name + "\"", x => (string)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type dateType when dateType == typeof(DateTime) || dateType == typeof(DateTime?):
                            {
                                helper = helper.MapTimeStamp("\"" + prop.Name + "\"", x => (DateTime?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type decimalType when decimalType == typeof(decimal) || decimalType == typeof(decimal?):
                            {
                                helper = helper.MapMoney("\"" + prop.Name + "\"", x => (decimal?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type doubleType when doubleType == typeof(double) || doubleType == typeof(double?):
                            {
                                helper = helper.MapDouble("\"" + prop.Name + "\"", x => (double?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type floatType when floatType == typeof(float) || floatType == typeof(float?):
                            {
                                helper = helper.MapReal("\"" + prop.Name + "\"", x => (float?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                        case Type guidType when guidType == typeof(Guid):
                            {
                                helper = helper.MapUUID("\"" + prop.Name + "\"", x => (Guid)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                                break;
                            }
                    }
                }
                return helper;
            }
    

    I use it the following way (I had entity named Undertaking):

    var undertakingHelper = BulkMapper.CreateHelper<Model.Undertaking>("dbo", nameof(Model.Undertaking));
    undertakingHelper.SaveAll(transaction.UnderlyingTransaction.Connection as Npgsql.NpgsqlConnection, undertakingsToAdd));
    

    I showed an example with transaction, but it can also be done with normal connection retrieved from context. undertakingsToAdd is enumerable of normal entity records, which I want to bulkInsert into DB.

    This solution, to which I've got after few hours of research and trying, is as you could expect much faster and finally easy to use and free! I really advice you to use this solution, not only for the reasons mentioned above, but also because it's the only one with which I had no problems with Postgresql itself, many other solutions work flawlessly for example with SqlServer.

    0 讨论(0)
  • 2020-11-21 05:50

    Dispose() context create problems if the entities you Add() rely on other preloaded entities (e.g. navigation properties) in the context

    I use similar concept to keep my context small to achieve the same performance

    But instead of Dispose() the context and recreate, I simply detach the entities that already SaveChanges()

    public void AddAndSave<TEntity>(List<TEntity> entities) where TEntity : class {
    
    const int CommitCount = 1000; //set your own best performance number here
    int currentCount = 0;
    
    while (currentCount < entities.Count())
    {
        //make sure it don't commit more than the entities you have
        int commitCount = CommitCount;
        if ((entities.Count - currentCount) < commitCount)
            commitCount = entities.Count - currentCount;
    
        //e.g. Add entities [ i = 0 to 999, 1000 to 1999, ... , n to n+999... ] to conext
        for (int i = currentCount; i < (currentCount + commitCount); i++)        
            _context.Entry(entities[i]).State = System.Data.EntityState.Added;
            //same as calling _context.Set<TEntity>().Add(entities[i]);       
    
        //commit entities[n to n+999] to database
        _context.SaveChanges();
    
        //detach all entities in the context that committed to database
        //so it won't overload the context
        for (int i = currentCount; i < (currentCount + commitCount); i++)
            _context.Entry(entities[i]).State = System.Data.EntityState.Detached;
    
        currentCount += commitCount;
    } }
    

    wrap it with try catch and TrasactionScope() if you need, not showing them here for keeping the code clean

    0 讨论(0)
提交回复
热议问题