Fastest way to subset - data.table vs. MySQL

前端未结

关注

 2  914

情话喂你 2020-12-14 08:45

I\'m an R user, and I frequently find that I need to write functions that require subsetting large datasets (10s of millions of rows). When I apply such functions over a la

2条回答

醉梦人生 (楼主)

2020-12-14 09:19

I am not an R user, but I know a little about Databases. I believe that MySQL (or any other reputatble RDBMS) will actually perform your subsetting operations faster (by, like, an order of magnitude, usually) barring any additional computation involved in the subsetting process.

I suspect your performance lag on small data sets is related to the expense of the connection and initial push of the data to MySQL. There is likely a point at which the connection overhead and data transfer time adds more to the cost of your operation than MySQL is saving you.

However, for datasets larger than a certain minimum, it seem likley that this cost is compensated for by the sheer speed of the database.

My understanding is that SQL can acheive most fetching and sorting operations much, much more quickly than iterative operations in code. But one must factor in the cost of the connection and (in this case) the initial transfer of data over the network wire.

I will be interested to hear what others have to say . . .

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...