R object of data.frame and data.table have same type?

前端 未结 2 989
伪装坚强ぢ
伪装坚强ぢ 2020-12-21 11:59

I am still very new to R and recently came across something I am not sure what it means. data.frame and data.table have same type? Can an object ha

相关标签:
2条回答
  • 2020-12-21 12:43

    Data.table and data.frame are different classes, but they are related through inheritance. Data.table inherits from data.frame, and basically expands its capabilities. You can also see that after converting cars to the data.table class:

    R> typeof(cars)
    [1] "list"      # similar to dataframe
    
    R> mode(cars)
    [1] "list"      # idem
    

    More information here or just google for "inheritance".

    0 讨论(0)
  • 2020-12-21 12:48

    It is not clear what you mean by your line "I obviously can't apply functions that apply to data.frames and not data.table".

    Many functions work as you would expect, whether applied to a data.frame or to a data.table. In particular, if you read the help page to ?data.table, you would find this specific line in the first paragraph of the description:

    Since a data.table is a data.frame, it is compatible with R functions and packages that only accept data.frame.

    You can test this out yourself:

    library(data.table)
    CARS <- data.table(cars)
    

    The following should all give you the same results. They aren't the "data.table" way of doing things, but I've just popped off a few things off the top of my head to show you that many (most?) functions can be used with data.table the same way that you would use them with data.frame (but at that point, you miss out on all the great stuff that data.table has to offer).

    with(cars, tapply(dist, speed, FUN = mean))
    with(CARS, tapply(dist, speed, FUN = mean))
    aggregate(dist ~ speed, cars, as.vector)
    aggregate(dist ~ speed, CARS, as.vector)
    colSums(cars)
    colSums(CARS)
    as.matrix(cars)
    as.matrix(CARS)
    t(cars)
    t(CARS)
    table(cut(cars$speed, breaks=3), cut(cars$dist, breaks=5))
    table(cut(CARS$speed, breaks=3), cut(CARS$dist, breaks=5))
    cars[cars$speed == 4, ]
    CARS[CARS$speed == 4, ]
    

    However, there are some cases in which this won't work. Compare:

    cars[cars$speed == 4, 1]
    CARS[CARS$speed == 4, 1]
    

    For a better understanding of that, I recommend reading the FAQs. In particular, a couple of relevant points have been summarized at this question: what you can do with data.frame that you can't in data.table.


    If your question is, more generally, "Can an object have more than one class?", then you've seen from your own exploration that, yes, it can. For more about that, you can read this page from Hadley's devtools wiki.


    Classes also affect things like how objects are printed and how they interact with other functions.

    Consider the rle function. If you look at the class, it returns "rle", and if you look at its structure, it shows that it is a list.

    > x <- rev(rep(6:10, 1:5))
    > y <- rle(x)
    > x
     [1] 10 10 10 10 10  9  9  9  9  8  8  8  7  7  6
    > y
    Run Length Encoding
      lengths: int [1:5] 5 4 3 2 1
      values : int [1:5] 10 9 8 7 6
    > class(y)
    [1] "rle"
    > str(y)
    List of 2
     $ lengths: int [1:5] 5 4 3 2 1
     $ values : int [1:5] 10 9 8 7 6
     - attr(*, "class")= chr "rle"
    

    As the length of each list item is the same, you might expect that you can conveniently use data.frame() to convert it to a data.frame. Let's try:

    > data.frame(y)
    Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
      cannot coerce class ""rle"" to a data.frame
    > unclass(y)
    $lengths
    [1] 5 4 3 2 1
    
    $values
    [1] 10  9  8  7  6
    
    > data.frame(unclass(y))
      lengths values
    1       5     10
    2       4      9
    3       3      8
    4       2      7
    5       1      6
    

    Or, let's add another class to the object and try:

    > class(y) <- c(class(y), "list")
    > y ## Printing is not affected
    Run Length Encoding
      lengths: int [1:5] 5 4 3 2 1
      values : int [1:5] 10 9 8 7 6
    > data.frame(y) ## But interaction with other functions is
      lengths values
    1       5     10
    2       4      9
    3       3      8
    4       2      7
    5       1      6
    
    0 讨论(0)
提交回复
热议问题