I am having difficulty grasping the essence of the setDT()
function. As I read code on SO, I frequently come across the use of setDT()
to create a
Update:
@Roland makes some good points in the comments section, and the post is better for them. While I originally focused on memory overflow issues, he pointed out that even if this doesn't happen, memory management of various copies takes substantial time, which is a more common everyday concern. Examples of both issues have now been added as well.
I like this question on stackoverflow because I think it is really about avoiding stack overflow in R when dealing with larger data sets.
setDT()
is not a replacement for data.table()
. It's a more efficient replacement for as.data.table()
which can be used with certain types of objects.
mydata <- as.data.table(mydata)
will copy the object behind mydata
, convert the copy to a data.table
, then change the mydata
symbol to point to the copy.setDT(mydata)
will change the object behind mydata
to a data.table
. No copying is done.So what's a realistic situation to use setDT()
? When you can't control the class of the original data. For instance, most packages for working with databases give data.frame
output. In that case, your code would be something like
mydata <- dbGetQuery(conn, "SELECT * FROM mytable") # Returns a data.frame
setDT(mydata) # Make it a data.table
When should you use as.data.table(x)
? Whenever x
isn't a list
or data.frame
. The most common use is for matrices.