Loading mysql table into python takes a very long time compared to R

后端 未结 2 2098
太阳男子
太阳男子 2021-02-08 23:06

I have a fairly large mysql table, about 30M rows, 6 columns, about 2gb when loaded into memory.

I work with both python and R. In R, I can load the table into memory a

2条回答
  •  遇见更好的自我
    2021-02-08 23:52

    Thanks to helpful comments, particularly from @roganjosh, it appears that the issue is that the default mysql connector is written in python rather than C, which makes it very slow. The solution is to use MySQLdb, which is a native C connector.

    In my particular setup, running python 3 with anaconda, that wasn't possible because MySQLdb is only supported in python 2. However, there is an implementation of MySQLdb for python 3 under the name mysqlclient.

    Using this implementation the time is now down to about 5 minutes to read the whole table, not as fast as R, but much less than the 40 or so it was taking before.

    I'm still open to suggestions that would make it faster, but my guess is that this is as good as it's going to get.

提交回复
热议问题