I\'d like to understand why these two methods for indexing an empty data.frame
result in an NA
row number being assigned to the first row only<
Well, the most important part of this answer is that code like this should be avoided. It is very inefficient to add data to a data.frame in a R row-by-row (see Circle 2 of the R Inferno) . There are almost always better ways to do this depending on what exactly are you doing.
But in getting to what's going on here. All of this comes down to the $.data.frame<-
, [.data.frame
, and [<-.data.frame
functions. In the first case, with
df[1,]$Number <- 123456
you are doing the subset first which calls [<-.data.frame
. When you ask for a row of a data.frame that doesn't exist, you get a bunch of NA values for everything (including row names). So now you have an empty data.frame with NA values in the columns and row names. Now you call $<-.data.frame
to just update the Number
column. You don't update the row numbers. This new value then get's passed to [<-.data.frame
to merge it back into the data.frame. When this command runs, it checks to make sure that there are no duplicated row names. For the first row, since there's only one row and it has the name NA, that name is kept. However when there are duplicate names, the function replaces those values with the index of the row numbers. That's why you get an NA for the first row, but when it tries to add the next row, it tried NA again, but sees that's a duplicate so it has to choose a new name. (See what happens when you try df[1:2,]$Number <- 123456
then df[3,]$Number <- 456789
)
On the other hand, when you do
df[1,1] <- 123456
That doesn't do the subsetting first to create a row with missing row names. you go right to assignment skipping $.data.frame<-
and [.data.frame
. In this case, it doesn't have to merge in a new row with an NA row name, it can create the row right away and assign a row name. This is just a special property of calling the assignment operator with having to do the extraction first. You can put the debugger on with debug(`[<-.data.frame`)
to see exactly how that happens.
So the first method is basically doing three steps: 1) extact df[1,]
, 2) change the value of the number column, then 3) merge that new value back into df[1,]
. The second method skips the first to steps and is just directly merging values into df[1,]
. And the real difference is just how each of those functions choose row names for rows that don't exist yet.