How to persist a lot of entities (JPA)

后端 未结 4 1405
半阙折子戏
半阙折子戏 2020-12-16 12:44

I need to process a CSV file and for each record (line) persist an entity. Right now, I do it this way:

while ((line = reader.readNext()) != null) {
    Enti         


        
相关标签:
4条回答
  • 2020-12-16 13:26

    You can write them with a classical SQL Insert Statement direct into the database.

    @see EntityManager.createNativeQuery

    0 讨论(0)
  • 2020-12-16 13:27

    I think one common way to do this is with transactions. If you begin a new transaction and then persist a large number of objects, they won't actually be inserted into the DB until you commit the transaction. This can gain you some efficiencies if you have a large number of items to commit.

    Check out EntityManager.getTransaction

    0 讨论(0)
  • 2020-12-16 13:39

    To make it go faster, at least in Hibernate, you would do a flush() and a clear() after a certain number of inserts. I have done this approach for millions of records and it works. It's still slow, but it's much faster than not doing it. The basic structure is like this:

    int i = 0;
    for(MyThingy thingy : lotsOfThingies) {
    
        dao.save(thingy.toModel())
    
        if(++i % 20 == 0) {
            dao.flushAndClear();
        }
    
    }
    
    0 讨论(0)
  • 2020-12-16 13:48

    The JPA API doesn't provide you all the options to make this optimal. Depending on how fast you want to do this you are going to have to look for ORM specific options - Hibernate in your case.

    Things to check:

    1. Check you are using a single transaction (Yes, apparently you are sure of this)
    2. Check your JPA provider (Hibernate) is using the JDBC batch API (refer: hibernate.jdbc.batch_size)
    3. Check if you can bypass getting generated keys (depends on db/jdbc driver how much benefit you get from this - refer: hibernate.jdbc.use_getGeneratedKeys)
    4. Check if you can bypass cascade logic (only minimal performance benefit from this)

    So in Ebean ORM this would be:

        EbeanServer server = Ebean.getServer(null);
    
        Transaction transaction = server.beginTransaction();
        try {
            // Use JDBC batch API with a batch size of 100
            transaction.setBatchSize(100);
            // Don't bother getting generated keys
            transaction.setBatchGetGeneratedKeys(false);
            // Skip cascading persist 
            transaction.setPersistCascade(false);
    
            // persist your beans ...
            Iterator<YourEntity> it = null; // obviously should not be null 
            while (it.hasNext()) {
                YourEntity yourEntity = it.next();
                server.save(yourEntity);
            }
    
            transaction.commit();
        } finally {
            transaction.end();
        }
    

    Oh, and if you do this via raw JDBC you skip the ORM overhead (less object creation / garbage collection etc) - so I wouldn't ignore that option.

    So yes, this doesn't answer your question but might help your search for more ORM specific batch insert tweaks.

    0 讨论(0)
提交回复
热议问题