SQLAlchemy memory hog on select statement

后端未结

关注

 1  1347

As per the SQLAlchemy, select statements are treated as iterables in for loops. The effect is that a select statement that would return a massive amount of rows does not us

相关标签:

1条回答

陌清茗

2020-12-24 04:19

The basic MySQLdb cursor fetches the entire query result at once from the server. This can consume a lot of memory and time. Use MySQLdb.cursors.SSCursor when you want to make a huge query and pull results from the server one at a time.

Therefore, try passing connect_args={'cursorclass': MySQLdb.cursors.SSCursor} when creating the engine:

   from sqlalchemy import create_engine, MetaData
   import MySQLdb.cursors
   engine = create_engine('mysql://root:zenoss@localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
   meta = MetaData(engine, reflect=True)
   conn = engine.connect()
   rs = s.execution_options(stream_results=True).execute()

See http://www.sqlalchemy.org/trac/ticket/1089

Note that using SSCursor locks the table until the fetch is complete. This affects other cursors using the same connection: Two cursors from the same connection can not read from the table concurrently.

However, cursors from different connections can read from the same table concurrently.

Here is some code demonstrating the problem:

import MySQLdb
import MySQLdb.cursors as cursors
import threading
import logging
import config

logger = logging.getLogger(__name__)
query = 'SELECT * FROM huge_table LIMIT 200'

def oursql_conn():
    import oursql
    conn = oursql.connect(
        host=config.HOST, user=config.USER, passwd=config.PASS,
        db=config.MYDB)
    return conn

def mysqldb_conn():
    conn = MySQLdb.connect(
        host=config.HOST, user=config.USER,
        passwd=config.PASS, db=config.MYDB,
        cursorclass=cursors.SSCursor) 
    return conn

def two_cursors_one_conn():
    """Two SSCursors can not use one connection concurrently"""
    def worker(conn):
        cursor = conn.cursor()
        cursor.execute(query)
        for row in cursor:
            logger.info(row)

    conn = mysqldb_conn()
    threads = [threading.Thread(target=worker, args=(conn, ))
               for n in range(2)]
    for t in threads:
        t.daemon = True
        t.start()
        # Second thread may hang or raise OperationalError:
        # File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 289, in _fetch_row
        #   return self._result.fetch_row(size, self._fetch_type)
        # OperationalError: (2013, 'Lost connection to MySQL server during query')

    for t in threads:
        t.join()

def two_cursors_two_conn():
    """Two SSCursors from independent connections can use the same table concurrently"""    
    def worker():
        conn = mysqldb_conn()        
        cursor = conn.cursor()
        cursor.execute(query)
        for row in cursor:
            logger.info(row)

    threads = [threading.Thread(target=worker) for n in range(2)]
    for t in threads:
        t.daemon = True
        t.start()
    for t in threads:
        t.join()


logging.basicConfig(level=logging.DEBUG,
                    format='[%(asctime)s %(threadName)s] %(message)s',
                    datefmt='%H:%M:%S')
two_cursors_one_conn()
two_cursors_two_conn()

Note that oursql is an alternative set of MySQL bindings for Python. oursql cursors are true server-side cursors which fetch rows lazily by default. With oursql installed, if you change

conn = mysqldb_conn()

conn = oursql_conn()

then two_cursors_one_conn() runs without hanging or raising an exception.

0 讨论(0)