In the tidyverse, what is the difference between an object of class “tbl” and “tbl_df”?

老子叫甜甜 提交于 2020-01-15 11:24:12

问题


When creating a tibble,

tbl <- tibble(A=1:5, B=6:10)

the result of

class(tbl)

is

[1] "tbl_df"     "tbl"        "data.frame"

I'm used to seeing this as I use dplyr quite a bit. But when is an object just a "tbl" (and not a "tbl_df") or vice versa? I'd just like to know a bit more about the difference, if any.

Any documentation would be much appreciated!


回答1:


You can think of a "tibble" as an interface. If an object can respond to all the tibble actions, then you can think of it as a tibble. R doesn't have strong typing.

So tbl is the generic tibble, and tbl_df is a specific type of tibble that basically stores it's data in a data.frame.

There are other packages like dtplyr that allow you to act like a tibble but store your data in a data.table. For example

library(dtplyr)
ds <- tbl_dt(mtcars)
class(ds)
# [1] "tbl_dt"     "tbl"        "data.table" "data.frame"

There's also the dbplyr package which allows you to use a SQL database back end. For example

library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars",temporary = FALSE)
cars_db <- tbl(con, "mtcars")
class(cars_db)
# [1] "tbl_dbi"  "tbl_sql"  "tbl_lazy" "tbl"  

So again we see that this thing generally can act as a tibble, but it has other classes that are there so that it can try to do all it's work in the database engine, rather than manipulating the data in R itself.

So there's not really a "difference" between tbl and tbl_df. The latter just says how the tibble is actually being implemented so the behavior can differ (be more optimized).

For more information, you can check out the tibble vignette or the extending tibble vignette



来源:https://stackoverflow.com/questions/51749664/in-the-tidyverse-what-is-the-difference-between-an-object-of-class-tbl-and-t

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!