JPA: what is the proper pattern for iterating over large result sets?

后端 未结 15 2415
攒了一身酷
攒了一身酷 2020-11-27 09:50

Let\'s say I have a table with millions of rows. Using JPA, what\'s the proper way to iterate over a query against that table, such that I don\'t have all an in-memo

相关标签:
15条回答
  • 2020-11-27 10:22

    I tried the answers presented here, but JBoss 5.1 + MySQL Connector/J 5.1.15 + Hibernate 3.3.2 didn't work with those. We've just migrated from JBoss 4.x to JBoss 5.1, so we've stuck with it for now, and thus the latest Hibernate we can use is 3.3.2.

    Adding couple of extra parameters did the job, and code like this runs without OOMEs:

            StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession();
    
            Query query = session
                    .createQuery("SELECT a FROM Address a WHERE .... ORDER BY a.id");
            query.setFetchSize(Integer.valueOf(1000));
            query.setReadOnly(true);
            query.setLockMode("a", LockMode.NONE);
            ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
            while (results.next()) {
                Address addr = (Address) results.get(0);
                // Do stuff
            }
            results.close();
            session.close();
    

    The crucial lines are the query parameters between createQuery and scroll. Without them the "scroll" call tries to load everything into memory and either never finishes or runs to OutOfMemoryError.

    0 讨论(0)
  • 2020-11-27 10:24

    You can't really do this in straight JPA, however Hibernate has support for stateless sessions and scrollable result sets.

    We routinely process billions of rows with its help.

    Here is a link to documentation: http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html#batch-statelesssession

    0 讨论(0)
  • 2020-11-27 10:29

    Here's a simple, straight JPA example (in Kotlin) that shows how you can paginate over an arbitrarily large result set, reading chunks of 100 items at a time, without using a cursor (each cursor consumes resources on the database). It uses keyset pagination.

    See https://use-the-index-luke.com/no-offset for the concept of keyset pagination, and https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ for a comparison of different ways to paginate along with their drawbacks.

    /*
    create table my_table(
      id int primary key, -- index will be created
      my_column varchar
    )
    */
    
    fun keysetPaginationExample() {
        var lastId = Integer.MIN_VALUE
        do {
    
            val someItems =
            myRepository.findTop100ByMyTableIdAfterOrderByMyTableId(lastId)
    
            if (someItems.isEmpty()) break
    
            lastId = someItems.last().myTableId
    
            for (item in someItems) {
              process(item)
            }
    
        } while (true)
    }
    
    0 讨论(0)
  • 2020-11-27 10:31

    You can use another "trick". Load only collection of identifiers of the entities you're interested in. Say identifier is of type long=8bytes, then 10^6 a list of such identifiers makes around 8Mb. If it is a batch process (one instance at a time), then it's bearable. Then just iterate and do the job.

    One another remark - you should anyway do this in chunks - especially if you modify records, otherwise rollback segment in database will grow.

    When it comes to set firstResult/maxRows strategy - it will be VERY VERY slow for results far from the top.

    Also take into consideration that the database is probably operating in read commited isolation, so to avoid phantom reads load identifiers and then load entities one by one (or 10 by 10 or whatever).

    0 讨论(0)
  • 2020-11-27 10:31

    Use Pagination Concept for retrieving result

    0 讨论(0)
  • 2020-11-27 10:34

    An Example with JPA and NativeQuery fetching everytime the size Elements using offsets

    public List<X> getXByFetching(int fetchSize) {
            int totalX = getTotalRows(Entity);
            List<X> result = new ArrayList<>();
            for (int offset = 0; offset < totalX; offset = offset + fetchSize) {
                EntityManager entityManager = getEntityManager();
                String sql = getSqlSelect(Entity) + " OFFSET " + offset + " ROWS";
                Query query = entityManager.createNativeQuery(sql, X.class);
                query.setMaxResults(fetchSize);
                result.addAll(query.getResultList());
                entityManager.flush();
                entityManager.clear();
            return result;
        }
    
    0 讨论(0)
提交回复
热议问题