Large ResultSet on postgresql query

前端 未结 4 1673
难免孤独
难免孤独 2021-01-05 23:57

I\'m running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capabili

相关标签:
4条回答
  • 2021-01-06 00:41

    I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.

    That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.

    If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.

    If neither of those apply, you've got a real problem. Time to rethink it.

    After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.

    I'd recommend reorienting yourself - start thinking about it as a reporting solution.

    0 讨论(0)
  • 2021-01-06 00:46

    I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.

    I added this: @Transactional(readOnly = true)

    Cheers.

    0 讨论(0)
  • 2021-01-06 00:50

    In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.

    Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.

    0 讨论(0)
  • 2021-01-06 01:00

    The fetchSize property worked as described at postgres manual.

    My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.

    Thanks for all the feedback.

    0 讨论(0)
提交回复
热议问题