How to improve query time from AWS Aurora (RDS) when using Python 3 executed on AWS EC2 Amazon Linux

只谈情不闲聊 提交于 2019-12-25 01:25:56

问题


I'm using Python 3 with pymysql package to query raw data from AWS Aurora while executing from an EC2 with Amazon Linux. And I'd like to improve the performance significantly.

So far, I managed to get the job done but it takes 150 seconds to get the results of 2.3 million rows back, using the following code:

import pandas as pd
import pymysql

conn = pymysql.connect(host, user=user,port=port,
                       passwd=password, db=dbname)

myQuery = '''
    SELECT * FROM fEvents f  
    Left Join fParams fp  
    on f.id = fp.id 
    WHERE f.DateTime BETWEEN '2019-01-24' and '2019-02-28' 
    '''

df = pd.read_sql(myQuery, con=conn) 

When we tried executing the same query from the same EC2 using node.js we got an object with the 2.3 million results within only 20 seconds! Since the rest of the code is in Python 3, I'm struggling to improve the performance of my Python API.

I'll appreciate any suggestions or explanations please.

来源:https://stackoverflow.com/questions/56116451/how-to-improve-query-time-from-aws-aurora-rds-when-using-python-3-executed-on

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!