Need to insert 100000 rows in mysql using hibernate in under 5 seconds

前端 未结 4 471
我寻月下人不归
我寻月下人不归 2021-01-30 23:38

I am trying to insert 100,000 rows in a MYSQL table under 5 seconds using Hibernate(JPA). I have tried every trick hibernate offers and still can not do better than 35 seconds.

相关标签:
4条回答
  • 2021-01-30 23:53

    Uff. You can do a lot of things to increase speed.

    1.) Use @DynamicInsert and @DynamicUpdate to prevent the DB from inserting non-empty columns and updating changed columns.

    2.) Try to insert the columns directly (without using hibernate) into your database to see if hibernate is really your bottleneck.

    3.) Use a sessionfactory and only commit your transaction every e.g. 100 inserts. Or only open and close the transaction once and flush your data every 100 inserts.

    4.) Use the ID generation strategy "sequence" and let hibernate preallocate (via the parameter allocationsize) the IDs.

    5.) Use caches.

    Some of this possible solutions can have timing disadvantages when not used correctly. But you have a lot of opportunities.

    0 讨论(0)
  • 2021-01-31 00:02
    1. You are using Spring for managing the transaction but break it by using thread as the current session context. When using Spring to manage your transactions don't mess around with the hibernate.current_session_context_class property. Remove it.

    2. Don't use the DriverManagerDataSource use a proper connection pool like HikariCP.

    3. In your for loop you should flush and clear the EntityManager at regular intervals, preferably the same as your batch size. If you don't a single persist takes longer and longer, because when you do that Hibernate checks the first level cache for dirty objects, the more objects the more time it takes. With 10 or 100 it is acceptable but checking 10000s of objects for each persist will take its toll.

    -

    @Service
    @Transactional
    public class ServiceImpl implements MyService{
    
        @Autowired
        private MyDao dao;
    
        @PersistenceContext
        private EntityManager em;
    
    
        void foo(){
            int count = 0;
            for(MyObject d : listOfObjects_100000){
                dao.persist(d);
                count++;
                if ( (count % 30) == 0) {
                   em.flush();
                   em.clear();
                }    
            }
        }
    

    For a more in depth explanation see this blog and this blog.

    0 讨论(0)
  • 2021-01-31 00:09

    Another option to consider is StatelessSession:

    A command-oriented API for performing bulk operations against a database.

    A stateless session does not implement a first-level cache nor interact with any second-level cache, nor does it implement transactional write-behind or automatic dirty checking, nor do operations cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate's event model and interceptors. Stateless sessions are vulnerable to data aliasing effects, due to the lack of a first-level cache.

    For certain kinds of transactions, a stateless session may perform slightly faster than a stateful session.

    Related discussion: Using StatelessSession for Batch processing

    0 讨论(0)
  • 2021-01-31 00:10

    After trying all possible solutions I finally found a solution to insert 100,000 rows under 5 seconds!

    Things I tried:

    1) Replaced hibernate/database's AUTOINCREMENT/GENERATED id's by self generated ID's using AtomicInteger

    2) Enabling batch_inserts with batch_size=50

    3) Flushing cache after every 'batch_size' number of persist() calls

    4) multithreading (did not attempt this one)

    Finally what worked was using a native multi-insert query and inserting 1000 rows in one sql insert query instead of using persist() on every entity. For inserting 100,000 entities, I create a native query like this "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)" [1000 row inserts in one sql insert query]

    Now it takes around 3 seconds for inserting 100,000 records! So the bottleneck was the orm itself! For bulk inserts, the only thing that seems to work is native insert queries!

    0 讨论(0)
提交回复
热议问题