Inspired by Q6437164: can someone explain to me why the following works:
iriscopy<-iris #or whatever other data.frame
iriscopy$someNonExistantColumn[1]<
I think the answer is that it doesn't work.
I consider the $newcol
to be standard behaviour to create a new column. For example:
iris$newcol <- 1
will create a new column in the iris data.frame. All values will be 1, because of vector recycling.
This creation of a new column gets triggered when the expression evaluates to NULL. From ?$<-
:
So I think what happens here is that the expression evaluates to NULL, and this triggers the code to create a new column, which in turn uses vector recycling to fill the values.
Edit
The parsing probably works using $-assign $<-
rather than bracket-assign [<-
. Compare:
head(`$<-`(iris, newcol, 1))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 1
6 5.4 3.9 1.7 0.4 setosa 1
But bracket assign produces an error:
head(`[<-`(iris, newcol, 1))
Error in head(`[<-`(iris, newcol, 1)) :
error in evaluating the argument 'x' in selecting a method for function 'head': Error in is.atomic(value) : 'value' is missing
The R language definition manual gives us a pointer to how R evaluates expressions of the form:
x$foo[1] <- 15
namely it is as if we have called
`*tmp*` <- x
x <- "$<-.data.frame"(`*tmp*`, name = "foo",
value = "[<-.data.frame"("$.data.frame"(`*tmp*`, "foo"),
1, value = 15))
rm(`*tmp*`)
the middle bit might be easier to grapple with if we drop, for purposes of exposition, the actual methods used:
x <- "$<-"(`*tmp*`, name = "foo",
value = "[<-"("$"(`*tmp*`, "foo"), 1, value = 15))
To go back to your example using iris
, we have something like
iris$foo[1] <- 15
Here, the functions are evaluated recursively. First the extractor function "$"
is used to access component "foo"
from iris
, which is NULL
:
> "$"(iris, "foo")
NULL
Then, "[<-"
is used to replace the first element of the object returned above (the NULL
) with the value 15
, i.e. a call of:
> "[<-"(NULL, 1, value = 15)
[1] 15
Now, this is the object that is used as argument value
in the outermost part of our call, namely the assignment using "$<-"
:
> head("$<-"(iris, "foo", value = 15))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species foo
1 5.1 3.5 1.4 0.2 setosa 15
2 4.9 3.0 1.4 0.2 setosa 15
3 4.7 3.2 1.3 0.2 setosa 15
4 4.6 3.1 1.5 0.2 setosa 15
5 5.0 3.6 1.4 0.2 setosa 15
6 5.4 3.9 1.7 0.4 setosa 15
(here wrapped in head()
to limit the number of rows shown.)
That hopefully explains how the function calls progress. The last issue to deal with is why the entire vector foo
is set to 15? The answer to that is given in the Details section of ?"$<-.data.frame"
:
Details:
....
Note that there is no ‘data.frame’ method for ‘$’, so ‘x$name’
uses the default method which treats ‘x’ as a list. There is a
replacement method which checks ‘value’ for the correct number of
rows, and replicates it if necessary.
The key bit is the last sentence. In the above example, the outermost assignment used value = 15
. But at this point, we are wanting to replace the entire component "foo"
, which is of length nrow(iris)
. Hence, what is actually used is value = rep(15, nrow(iris))
, in the outermost assignment/function call.
This example is all the more complex because you have to convert from the convenience notation of
x$foo[1] <- 15
into proper function calls using "$<-"()
, "[<-"()
, and "$"()
. The example in Section 3.4.4 of The R Language Definition uses this simpler example:
names(x)[3] <- "Three"
which evaluates to
`*tmp*` <- x
x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
rm(`*tmp*`)
which is slightly easier to get your head around because names()
looks like a usual function call.