When should I use setDT() instead of data.table() to create a data.table?

后端 未结 2 1441
盖世英雄少女心
盖世英雄少女心 2020-12-07 18:50

I am having difficulty grasping the essence of the setDT() function. As I read code on SO, I frequently come across the use of setDT() to create a

相关标签:
2条回答
  • 2020-12-07 19:33

    Update:

    @Roland makes some good points in the comments section, and the post is better for them. While I originally focused on memory overflow issues, he pointed out that even if this doesn't happen, memory management of various copies takes substantial time, which is a more common everyday concern. Examples of both issues have now been added as well.

    I like this question on stackoverflow because I think it is really about avoiding stack overflow in R when dealing with larger data sets.

    0 讨论(0)
  • 2020-12-07 19:40

    setDT() is not a replacement for data.table(). It's a more efficient replacement for as.data.table() which can be used with certain types of objects.

    • mydata <- as.data.table(mydata) will copy the object behind mydata, convert the copy to a data.table, then change the mydata symbol to point to the copy.
    • setDT(mydata) will change the object behind mydata to a data.table. No copying is done.

    So what's a realistic situation to use setDT()? When you can't control the class of the original data. For instance, most packages for working with databases give data.frame output. In that case, your code would be something like

    mydata <- dbGetQuery(conn, "SELECT * FROM mytable")  # Returns a data.frame
    setDT(mydata)                                        # Make it a data.table
    

    When should you use as.data.table(x)? Whenever x isn't a list or data.frame. The most common use is for matrices.

    0 讨论(0)
提交回复
热议问题