Using “Cursors” for paging in PostgreSQL [duplicate]

佐手、 提交于 2019-11-28 09:02:20
Craig Ringer

Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.

They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.

Cursors have several costs. If the cursor is not WITH HOLD you must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLD and the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.

Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.

There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....

In general, despite the fact that it can be quite inefficient, LIMIT and OFFSET can be the better solution. It can often be better to search the primary key rather than using OFFSET, though.

By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursors for this job.


Do the cursors require that a database connection be left open?

Yes.

Do the cursors run inside a transaction, locking resources until they are "closed"?

Yes unless they are WITH HOLD, in which case they consume other database resources.

Are there any other "gotchas" that I'm not aware of?

Yes, as the above should explain.

Colin 't Hart

For HTTP clients, don't use cursors to implement paging. For scalability, you don't want server resources tied up between requests.

Instead, use LIMIT and OFFSET on your queries; see LIMIT and OFFSET in the Pg docs.

But make sure that the indexing on your tables will support efficient queries of this form.

Design a RESTful API, so that the client can invoke the "next_url" (also passed in the response) to get the next set of rows.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!