Why does item assignment in non-existant data.frame column work?

后端 未结 2 752
渐次进展
渐次进展 2021-01-18 07:58

Inspired by Q6437164: can someone explain to me why the following works:

iriscopy<-iris #or whatever other data.frame
iriscopy$someNonExistantColumn[1]<         


        
相关标签:
2条回答
  • 2021-01-18 08:10

    I think the answer is that it doesn't work.

    I consider the $newcol to be standard behaviour to create a new column. For example:

    iris$newcol <- 1
    

    will create a new column in the iris data.frame. All values will be 1, because of vector recycling.

    This creation of a new column gets triggered when the expression evaluates to NULL. From ?$<-:

    • "When $<- is applied to a NULL x, it first coerces x to list(). This is what also happens with [[<- if the replacement value value is of length greater than one: if value has length 1 or 0, x is first coerced to a zero-length vector of the type of value."

    So I think what happens here is that the expression evaluates to NULL, and this triggers the code to create a new column, which in turn uses vector recycling to fill the values.

    Edit

    The parsing probably works using $-assign $<- rather than bracket-assign [<-. Compare:

    head(`$<-`(iris, newcol, 1))
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
    1          5.1         3.5          1.4         0.2  setosa      1
    2          4.9         3.0          1.4         0.2  setosa      1
    3          4.7         3.2          1.3         0.2  setosa      1
    4          4.6         3.1          1.5         0.2  setosa      1
    5          5.0         3.6          1.4         0.2  setosa      1
    6          5.4         3.9          1.7         0.4  setosa      1
    

    But bracket assign produces an error:

    head(`[<-`(iris, newcol, 1))
    Error in head(`[<-`(iris, newcol, 1)) : 
      error in evaluating the argument 'x' in selecting a method for function 'head': Error in is.atomic(value) : 'value' is missing
    
    0 讨论(0)
  • 2021-01-18 08:18

    The R language definition manual gives us a pointer to how R evaluates expressions of the form:

    x$foo[1] <- 15
    

    namely it is as if we have called

    `*tmp*` <- x
    x <- "$<-.data.frame"(`*tmp*`, name = "foo", 
                          value = "[<-.data.frame"("$.data.frame"(`*tmp*`, "foo"), 
                                                   1, value = 15))
    rm(`*tmp*`)
    

    the middle bit might be easier to grapple with if we drop, for purposes of exposition, the actual methods used:

    x <- "$<-"(`*tmp*`, name = "foo", 
               value = "[<-"("$"(`*tmp*`, "foo"), 1, value = 15))
    

    To go back to your example using iris, we have something like

    iris$foo[1] <- 15
    

    Here, the functions are evaluated recursively. First the extractor function "$" is used to access component "foo" from iris, which is NULL:

    > "$"(iris, "foo")
    NULL
    

    Then, "[<-" is used to replace the first element of the object returned above (the NULL) with the value 15, i.e. a call of:

    > "[<-"(NULL, 1, value = 15)
    [1] 15
    

    Now, this is the object that is used as argument value in the outermost part of our call, namely the assignment using "$<-":

    > head("$<-"(iris, "foo", value = 15))
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species foo
    1          5.1         3.5          1.4         0.2  setosa  15
    2          4.9         3.0          1.4         0.2  setosa  15
    3          4.7         3.2          1.3         0.2  setosa  15
    4          4.6         3.1          1.5         0.2  setosa  15
    5          5.0         3.6          1.4         0.2  setosa  15
    6          5.4         3.9          1.7         0.4  setosa  15
    

    (here wrapped in head() to limit the number of rows shown.)

    That hopefully explains how the function calls progress. The last issue to deal with is why the entire vector foo is set to 15? The answer to that is given in the Details section of ?"$<-.data.frame":

    Details:
    
    ....
    
             Note that there is no ‘data.frame’ method for ‘$’, so ‘x$name’
         uses the default method which treats ‘x’ as a list.  There is a
         replacement method which checks ‘value’ for the correct number of
         rows, and replicates it if necessary.
    

    The key bit is the last sentence. In the above example, the outermost assignment used value = 15. But at this point, we are wanting to replace the entire component "foo", which is of length nrow(iris). Hence, what is actually used is value = rep(15, nrow(iris)), in the outermost assignment/function call.

    This example is all the more complex because you have to convert from the convenience notation of

    x$foo[1] <- 15
    

    into proper function calls using "$<-"(), "[<-"(), and "$"(). The example in Section 3.4.4 of The R Language Definition uses this simpler example:

    names(x)[3] <- "Three"
    

    which evaluates to

    `*tmp*` <- x
    x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
    rm(`*tmp*`)
    

    which is slightly easier to get your head around because names() looks like a usual function call.

    0 讨论(0)
提交回复
热议问题