Necessity of explicit cursor.close()

前端 未结 5 1943
既然无缘
既然无缘 2020-12-28 13:57

From time to time, I\'m executing raw queries using connection.cursor() instead of using ORM (since it is definitely not a silver bullet).

I\'ve noticed that in seve

相关标签:
5条回答
  • 2020-12-28 14:41

    The explicit calling of cursor.close() might be because of two reasons:

    1. __del__ is not guaranteed to be called and has some issues you can read here and here
    2. Explicit is better than implicit (Zen of Python)
    0 讨论(0)
  • 2020-12-28 14:43

    While the operating system can normally be relied upon to release resources, it's always good hygiene to close things like database connections to ensure resources are released when no longer required, the really important thing from a database point of view is to ensure that any changes are commit()ed.

    0 讨论(0)
  • 2020-12-28 14:44

    I'm a bit late to this question. Maybe a close-on-exit-scope is what you want.

    from contextlib import closing
    from django.db import connection
    
    with closing(connection.cursor()) as cursor:
        cursor.execute(...)
        cursor.execute(...)
        cursor.execute(...)
    
    0 讨论(0)
  • 2020-12-28 14:47

    Django's cursor class is just a wrapper around the underlying DB's cursor, so the effect of leaving the cursor open is basically tied to the underlying DB driver.

    According to psycopg2's (psycopg2 is DB driver Django uses for PostgreSQL DB's) FAQ, their cursors are lightweight, but will cache the data being returned from queries you made using the cursor object, which could potentially waste memory:

    Cursors are lightweight objects and creating lots of them should not pose any kind of problem. But note that cursors used to fetch result sets will cache the data and use memory in proportion to the result set size. Our suggestion is to almost always create a new cursor and dispose old ones as soon as the data is not required anymore (call close() on them.) The only exception are tight loops where one usually use the same cursor for a whole bunch of INSERTs or UPDATEs.

    Django uses MySQLdb as the backend for MySQL, which has several different types of cursors, including some that actually store their result-sets on the server-side. The MySQLdb documentation for Cursor.close make a point to note that it's very important to close the server-side cursor's when you're done with them:

    If you are using server-side cursors, it is very important to close the cursor when you are done with it and before creating a new one.

    However, this isn't relevant for Django, because it uses the default Cursor class provided by MySQLdb, which stores results on the client-side. Leaving a used cursor open just risks wasting the memory used by the stored result-set, just like psycopg2. The close method on the cursor just deletes the internal reference to the db connection and exhausts the stored result set:

    def close(self):
        """Close the cursor. No further queries will be possible."""
        if not self.connection: return
        while self.nextset(): pass
        self.connection = None
    

    As best as I can tell from looking at their source, all the remaining backends used by Django (cx_oracle, sqlite3/pysqlite2) all follow the same pattern; free memory by deleting/resetting stored results/object references. The sqlite3 docs don't even mention that the Cursor class has a close method, and it's only used sporadically in the included example code.

    You are right that a cursor will be closed when __del__() is called on the cursor object, so the need to explicitly close is only an issue if you're keeping a long-living reference to the cursor; e.g. a self.cursor object that you're keeping as an instance method of a class.

    0 讨论(0)
  • 2020-12-28 14:49

    __del__/.close():

    1. __del__ is not guaranteed to be called
    2. some databases don't call cursor.close() in their __del__ (bad practice, but true)
    3. some databases don't actually create connections in the connection function, but in the cursor function instead (e.g. for 2&3: pyhive's presto [maybe they've since patched it])

    On server connections in general

    Most servers have an idle timeout configuration property (let's call that T). If a connection is idle for more than T seconds, the server will remove the connection. Most servers also have properties to set the size of the worker thread pool (W). If you already have W connections to your server, it will likely hang when a new connection is attempted. For a second imagine that you don't have the option to explicitly close connections. In that situation, you have to set the timeout to be small enough that your worker pool is never completely used, which is a function of how many concurrent connections you have.

    However, if you do close your cursors/connections (even when not equivalent by [3] above, they behave in a similar way), then you don't have to manage these server configuration properties and your thread pool simply needs to be large enough to manage all concurrent connections (with the option for an occasional wait for new resources). I've seen some servers (e.g. Titan on Cassandra) unable to recover from running out of workers in the thread pool, so the whole server goes down until restarted.

    TL/DR If you're using very well-developed libraries, like the ones dano mentions, you won't have an issue. If you're using less pristine libraries, you may end up blocking on the server acquiring a worker thread if you don't call .close(), depending on your server config and access rates.

    0 讨论(0)
提交回复
热议问题