JPA: what is the proper pattern for iterating over large result sets?

后端 未结 15 2416
攒了一身酷
攒了一身酷 2020-11-27 09:50

Let\'s say I have a table with millions of rows. Using JPA, what\'s the proper way to iterate over a query against that table, such that I don\'t have all an in-memo

相关标签:
15条回答
  • 2020-11-27 10:35

    It depends upon the kind of operation you have to do. Why are you looping over a million of row? Are you updating something in batch mode? Are you going to display all records to a client? Are you computing some statistics upon the retrieved entities?

    If you are going to display a million records to the client, please reconsider your user interface. In this case, the appropriate solution is paginating your results and using setFirstResult() and setMaxResult().

    If you have launched an update of a large amount of records, you'll better keep the update simple and use Query.executeUpdate(). Optionally, you can execute the update in asynchronous mode using a Message-Driven Bean o a Work Manager.

    If you are computing some statistics upon the retrieved entities, you can take advantage on the grouping functions defined by the JPA specification.

    For any other case, please be more specific :)

    0 讨论(0)
  • 2020-11-27 10:35

    I was surprised to see that the use of stored procedures was not more prominent in the answers here. In the past when I've had to do something like this, I create a stored procedure that processes data in small chunks, then sleeps for a bit, then continues. The reason for the sleeping is to not overwhelm the database which is presumably also being used for more real time types of queries, such as being connected to a web site. If there is no one else using the database, then you can leave out the sleep. If you need to ensure that you process each record once and only once, then you will need to create an additional table (or field) to store which records you have processed in order to be resilient across restarts.

    The performance savings here are significant, possibly orders of magnitude faster than anything you could do in JPA/Hibernate/AppServer land, and your database server will most likely have its own server side cursor type of mechanism for processing large result sets efficiently. The performance savings come from not having to ship the data from the database server to the application server, where you process the data, and then ship it back.

    There are some significant downsides to using stored procedures which may completely rule this out for you, but if you've got that skill in your personal toolbox and can use it in this kind of situation, you can knock out these kinds of things fairly quickly.

    0 讨论(0)
  • 2020-11-27 10:35

    To expand on @Tomasz Nurkiewicz's answer. You have access to the DataSource which in turn can provide you with a connection

    @Resource(name = "myDataSource",
        lookup = "java:comp/DefaultDataSource")
    private DataSource myDataSource;
    

    In your code you have

    try (Connection connection = myDataSource.getConnection()) {
        // raw jdbc operations
    }
    

    This will allow you to bypass JPA for some specific large batch operations like import/export, however you still have access to the entity manager for other JPA operations if you need it.

    0 讨论(0)
提交回复
热议问题