How Hibernate Batch insert works?

前端 未结 3 1540
攒了一身酷
攒了一身酷 2021-01-20 04:07

Can some one explain me how

hibernate.jdbc.batch_size=1000 

and

if (i % 100 == 0 && i>0) {
                           


        
相关标签:
3条回答
  • 2021-01-20 04:42

    Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database.

    Why we need

    It is important to keep in mind, that each update added to a Statement or PreparedStatement is executed separately by the database. That means, that some of them may succeed before one of them fails. All the statements that have succeeded are now applied to the database, but the rest of the updates may not be. This can result in an inconsistent data in the database.

    To avoid this, you can execute the batch update inside a transaction. When executed inside a transaction you can make sure that either all updates are executed, or none are. Any successful updates can be rolled back, in case one of the updates fail.

    What is Batch and Flushing

    Batch size and flushing is different thing. when you set hibernate.jdbc.batch_size to 1000 it means hibernate will do batch inserts or update upto 1000 entities.flush operation can be used the write all changes to the database before the transaction is committed

    if your batch size is set to 1000, and you flush every 100 entity, Hibernate will execute lots of small batches of 100 insert or update statements for 10 times.

    Please read more below this link:

    http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html

    Why number of objects being flushed should be equal to hibernate.jdbc.batch_size?

    0 讨论(0)
  • 2021-01-20 05:07

    Hibernate property hibernate.jdbc.batch_size is a way for hibernate to optimize your insert or update statetment whereas flushing loop is about memory exhaustion.

    Without batchsize when you try to save an entity hibernate fire 1 insert statement, thus if you work with a big collection, for each save hibernate fire 1 statement

    Imagine the following chunk of code :

    for(Entity e : entities){
    session.save(e);
    }
    

    Here hibernate will fire 1 insert statement per entity in your collection. if you have 100 elements in your collection so 100 insert statements will be fire. This approach is not very efficient for 2 main reasons:

    • 1) You increase exponentially your 1st level cache and you'll probably finish soon with an OutOfMemoryException.
    • 2) You degrade performance due to network round trip for each statement.

    hibernate.jdbc.batch_size and the flushing loop have 2 differents purposes but are complementary.

    Hibernate use the first to control how many entities will be in batch. Under the cover Hibernate use java.sql.Statement.addBatch(...) and executeBatch() methods.

    So hibernate.jdbc.batch_size tells hibernate how many times it have to call addBatch() before calling executeBatch().

    So setting this property doesn't prevent you of memory exhaution.

    In order to take care of the memory you have to flush your session on a regular basis and this is the purpose of flushing loop.

    When you write :

    for(Entity e : entities){
    if (i % 100 == 0 && i>0) {
                        session.flush();
                        session.clear();
                    }
    }
    

    you're telling hibernate to flush and clear the session every 100 entities (you release memory).

    So now what is the link between the 2 ?

    In order to be optimal you have to define your jdbc.batch_size and your flushing param identical.

    if you define a flush param lower that the batch_size you choose so hibernate will flush the session more frequently so it will create small batch until it arrive to btach size which is not efficient

    when the 2 are the same hibernate will only execute batches of optimal size except for the last one if size of collection is not a multiple of your batch_size.

    You can see the following post for more details about this last point

    0 讨论(0)
  • 2021-01-20 05:07

    hibernate.jdbc.batch_size determines the maximum batch size that is executed. If implicit or explicit flush is performed before the specified batch size is reached (the number of pending insert or update statements for the same table), all pending statements are packed in one batch, and the 'accumulation' of statements is restarted.

    So, in your example you would execute batches consisting of 100 statements each. Or, for example, if the batch size were 100 and the modulo divider were 500, when the flush operation occurs you would execute 5 batches consisting of 100 statements each.

    0 讨论(0)
提交回复
热议问题