Dataframe within dataframe?

后端 未结 3 1436
孤街浪徒
孤街浪徒 2021-01-04 02:38

Consider this example:

df <- data.frame(id=1:10,var1=LETTERS[1:10],var2=LETTERS[6:15])

fun.split <- function(x) tolower(as.character(x))
df$new.letter         


        
3条回答
  •  一生所求
    2021-01-04 02:59

    In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):

    a data frame is a list, with the components of that list being equal-length vectors

    The following code might be useful to understand.

    class(df$new.letters)
    [1] "matrix"
    
    
    str(df)
    'data.frame':   10 obs. of  4 variables:
     $ id         : int  1 2 3 4 5 6 7 8 9 10
     $ var1       : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
     $ var2       : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
     $ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : NULL
      .. ..$ : chr  "var1" "var2"
    

    Maybe the reason why it looks strange is in the print methods. Consider this:

    colnames(df$new.letters)
    [1] "var1" "var2"
    

    maybe there must something in the print methods that combine the sub-names of objects and display them all.

    For example here the vectors that constitute the df are:

    names(df)
    [1] "id"          "var1"        "var2"        "new.letters"
    

    but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:

    attributes(df$new.letters)
    $dim
    [1] 10  2
    
    $dimnames
    $dimnames[[1]]
    NULL
    
    $dimnames[[2]]
    [1] "var1" "var2"
    

    but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).

    Edit: Print methods

    Just for curiosity in order to improve this question I looked inside the methods of the print functions:

    methods(print)
    

    The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is listof.

    getS3method("print", "listof")
    function (x, ...) 
    {
        nn <- names(x)
        ll <- length(x)
        if (length(nn) != ll) 
            nn <- paste("Component", seq.int(ll))
        for (i in seq_len(ll)) {
            cat(nn[i], ":\n")
            print(x[[i]], ...)
            cat("\n")
        }
        invisible(x)
    }
    
    
    

    Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.

提交回复
热议问题