Loading mysql table into python takes a very long time compared to R

后端 未结 2 2095
太阳男子
太阳男子 2021-02-08 23:06

I have a fairly large mysql table, about 30M rows, 6 columns, about 2gb when loaded into memory.

I work with both python and R. In R, I can load the table into memory a

相关标签:
2条回答
  • 2021-02-08 23:52

    Thanks to helpful comments, particularly from @roganjosh, it appears that the issue is that the default mysql connector is written in python rather than C, which makes it very slow. The solution is to use MySQLdb, which is a native C connector.

    In my particular setup, running python 3 with anaconda, that wasn't possible because MySQLdb is only supported in python 2. However, there is an implementation of MySQLdb for python 3 under the name mysqlclient.

    Using this implementation the time is now down to about 5 minutes to read the whole table, not as fast as R, but much less than the 40 or so it was taking before.

    I'm still open to suggestions that would make it faster, but my guess is that this is as good as it's going to get.

    0 讨论(0)
  • 2021-02-08 23:55

    There is also a pure C/C++ ultramysql MySQL driver which can be used with the umysqldb adapter. The projects are not active, but could be of use for a one-time thing - I would not be using them in production though.

    Since pymysql is a pure-Python driver, you may also try running it on PyPy.

    0 讨论(0)
提交回复
热议问题