Python SQL Query Performance

余生长醉 提交于 2019-12-06 07:37:45

Are you using JayDeBeApi in combination with JPype or together with Jython? Fetching of large result sets with the JPype implementation causes some JNI calls for every single cell value which causes lot's of overhead. You should consider one of the following options:

  1. Minimize the size of your resultset. Do aggregations using SQL functions.
  2. Give the newest implementation of JPype1 a try. There have been some performance improvements.
  3. Switch your runtime to Jython (JayDeBeApi works on Jython as well)
  4. Implement the db queries and data extraction directly in Java and call the logic using JPype but with a interface not returning a large data set.
  5. Try to improve JPype and JayDeBeApi code

You might want to use curs.fetchmany() instead of fetchone. That will optimize, somewhat, the back and forth to fetch the rows.

Something like this will even hide the fact that you are fetching many rows at a time:

def fetchYield(cursor):
        li = []
        while True:
            if not li:
                li = cursor.fetchmany()
                if not li:
                    raise StopIteration
            yield li.pop(0)

for row in fetchYield(curs):
   <do something with row>

However, I think that if a raw sql query tool takes 3 minutes to fetch the data, it is not entirely unreasonable to have your Python code take 3x as long.

I had a similar problem and I observed an improvement using fetchall and setting the cursor arraysize parameter (detault to 1), as reported in the DB-API documentation on which JayDeBeApi is based.

cursor = conn.cursor()
cursor.arraysize = 10000
cursor.execute("select * from table1")

rows = cursor.fetchall()

# storing data in a pandas DataFrame
df = pd.DataFrame(data=rows, columns = ["C1", "C2", "C3"])

cursor.close()

I observed the following performances on a 600.000 rows fetching

arraysize = 10000 --- 509 seconds
arraysize = 1     --- 526 seconds

Nevertheless, I also observed a much greater fetching time compared, for instance, to a Java-based client using the same JDBC driver. My suggestion, as 9000 was saying, is to expend some time on your SQL query and let the database do the work, it's a faster and much more scalable solution.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!