When should I use setDT() instead of data.table() to create a data.table?

后端未结

关注

 2  1441

I am having difficulty grasping the essence of the setDT() function. As I read code on SO, I frequently come across the use of setDT() to create a

相关标签:

2条回答

无人及你

2020-12-07 19:33

Update:

@Roland makes some good points in the comments section, and the post is better for them. While I originally focused on memory overflow issues, he pointed out that even if this doesn't happen, memory management of various copies takes substantial time, which is a more common everyday concern. Examples of both issues have now been added as well.

I like this question on stackoverflow because I think it is really about avoiding stack overflow in R when dealing with larger data sets.

0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-12-07 19:40
setDT() is not a replacement for data.table(). It's a more efficient replacement for as.data.table() which can be used with certain types of objects.
- mydata <- as.data.table(mydata) will copy the object behind mydata, convert the copy to a data.table, then change the mydata symbol to point to the copy.
- setDT(mydata) will change the object behind mydata to a data.table. No copying is done.
So what's a realistic situation to use setDT()? When you can't control the class of the original data. For instance, most packages for working with databases give data.frame output. In that case, your code would be something like
```
mydata <- dbGetQuery(conn, "SELECT * FROM mytable")  # Returns a data.frame
setDT(mydata)                                        # Make it a data.table
```
When should you use as.data.table(x)? Whenever x isn't a list or data.frame. The most common use is for matrices.
0 讨论(0)
发布评论:

提交评论
- 加载中...