Unresponsive requests- understanding the bottleneck (Flask + Oracle + Gunicorn)

问题

I'm new to Flask/Gunicorn and have a very basic understanding of SQL.

I have a Flask app that connects to a remote oracle database with cx_oracle. Depending on the app route selected, it runs one of two queries. I run the app using gunicorn -w 4 flask:app. The first query is a simple query on a table with ~70000 rows and is very responsive. The second one is more complex, and queries several tables, one of which contains ~150 million rows. Through sprinkling print statements around, I notice that sometimes the second query never even starts, especially if it is not the first app.route selected by the user and they're both to be running concurrently. Opening the app.route('/') multiple times will trigger its query multiple times quickly and run it in parallel, but not with app.route('/2'). I have multiple workers enabled, and threaded=True for oracle. Why is this happening? Is it doomed to be slow/downright unresponsive due to the size of the table?

import cx_Oracle
from flask import Flask
import pandas as pd

app = Flask(__name__)

connection = cx_Oracle.connect("name","pwd", threaded=True)

@app.route('/')
def Q1():
    print("start q1")
    querystring=""" select to_char(to_date(col1,'mm/dd/yy'),'Month'), sum(col2)
        FROM tbl1"""
    df=pd.read_sql(querystring=,con=connection)
    print("q1 complete")

@app.route('/2')
def Q2():
    print("start q2")
    querystring=""" select tbl2.col1, 
        tbl2.col2, 
        tbl3.col3 
        FROM tbl2 INNER JOIN 
        tbl3 ON tbl2.col1 = tbl3.col1
        WHERE tbl2.col2 like 'X%' AND
        tbl2.col4 >=20180101"""
    df=pd.read_sql(querystring=,con=connection)
    print("q2 complete")

I have tried exporting the datasets for each query as csvs and have pandas read the csvs instead, in this scenario, both reads are can run concurrently very well, and doesn't miss a beat. Is this a SQL issue, thread issue, worker issue?

回答1:

Be aware that a connection can only process one thing at a time. If the connection is busy executing one of the queries, it can't execute the other one. Once execution is complete and fetching has begun the two can operate together, but each one has to wait for the other one to complete its fetch operation before the other one can begin. To get around this you should use a session pool (http://cx-oracle.readthedocs.io/en/latest/module.html#cx_Oracle.SessionPool) and then in each of your routes add this code:

connection = pool.acquire()

None of that will help the performance of the one query, but at least it will prevent interference from it!

来源：https://stackoverflow.com/questions/49390512/unresponsive-requests-understanding-the-bottleneck-flask-oracle-gunicorn

标签

python

multithreading

Flask

gunicorn

cx-oracle