Can some one explain me how
hibernate.jdbc.batch_size=1000
and
if (i % 100 == 0 && i>0) {
Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database.
Why we need
It is important to keep in mind, that each update added to a Statement or PreparedStatement is executed separately by the database. That means, that some of them may succeed before one of them fails. All the statements that have succeeded are now applied to the database, but the rest of the updates may not be. This can result in an inconsistent data in the database.
To avoid this, you can execute the batch update inside a transaction. When executed inside a transaction you can make sure that either all updates are executed, or none are. Any successful updates can be rolled back, in case one of the updates fail.
What is Batch and Flushing
Batch size and flushing is different thing. when you set hibernate.jdbc.batch_size
to 1000
it means hibernate will do batch inserts or update upto 1000
entities.flush
operation can be used the write all changes to the database before the transaction is committed
if your batch size is set to 1000, and you flush every 100 entity, Hibernate will execute lots of small batches of 100 insert or update statements for 10 times.
Please read more below this link:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html
Why number of objects being flushed should be equal to hibernate.jdbc.batch_size?
Hibernate property hibernate.jdbc.batch_size
is a way for hibernate to optimize your insert or update statetment whereas flushing loop is about memory exhaustion.
Without batchsize when you try to save an entity hibernate fire 1 insert statement, thus if you work with a big collection, for each save hibernate fire 1 statement
Imagine the following chunk of code :
for(Entity e : entities){
session.save(e);
}
Here hibernate will fire 1 insert statement per entity in your collection. if you have 100 elements in your collection so 100 insert statements will be fire. This approach is not very efficient for 2 main reasons:
OutOfMemoryException
.hibernate.jdbc.batch_size and the flushing loop have 2 differents purposes but are complementary.
Hibernate use the first to control how many entities will be in batch. Under the cover Hibernate use java.sql.Statement.addBatch(...)
and executeBatch()
methods.
So hibernate.jdbc.batch_size tells hibernate how many times it have to call addBatch()
before calling executeBatch()
.
So setting this property doesn't prevent you of memory exhaution.
In order to take care of the memory you have to flush your session on a regular basis and this is the purpose of flushing loop.
When you write :
for(Entity e : entities){
if (i % 100 == 0 && i>0) {
session.flush();
session.clear();
}
}
you're telling hibernate to flush and clear the session every 100 entities (you release memory).
So now what is the link between the 2 ?
In order to be optimal you have to define your jdbc.batch_size
and your flushing param identical.
if you define a flush param lower that the batch_size you choose so hibernate will flush the session more frequently so it will create small batch until it arrive to btach size which is not efficient
when the 2 are the same hibernate will only execute batches of optimal size except for the last one if size of collection is not a multiple of your batch_size.
You can see the following post for more details about this last point
hibernate.jdbc.batch_size
determines the maximum batch size that is executed. If implicit or explicit flush is performed before the specified batch size is reached (the number of pending insert or update statements for the same table), all pending statements are packed in one batch, and the 'accumulation' of statements is restarted.
So, in your example you would execute batches consisting of 100 statements each. Or, for example, if the batch size were 100 and the modulo divider were 500, when the flush operation occurs you would execute 5 batches consisting of 100 statements each.