Row numbers differ (NA vs 1) when adding first row to empty data.frame

后端 未结 1 1724
北海茫月
北海茫月 2021-01-12 21:52

I\'d like to understand why these two methods for indexing an empty data.frame result in an NA row number being assigned to the first row only<

相关标签:
1条回答
  • 2021-01-12 22:07

    Well, the most important part of this answer is that code like this should be avoided. It is very inefficient to add data to a data.frame in a R row-by-row (see Circle 2 of the R Inferno) . There are almost always better ways to do this depending on what exactly are you doing.

    But in getting to what's going on here. All of this comes down to the $.data.frame<-, [.data.frame, and [<-.data.frame functions. In the first case, with

    df[1,]$Number <- 123456
    

    you are doing the subset first which calls [<-.data.frame. When you ask for a row of a data.frame that doesn't exist, you get a bunch of NA values for everything (including row names). So now you have an empty data.frame with NA values in the columns and row names. Now you call $<-.data.frame to just update the Number column. You don't update the row numbers. This new value then get's passed to [<-.data.frame to merge it back into the data.frame. When this command runs, it checks to make sure that there are no duplicated row names. For the first row, since there's only one row and it has the name NA, that name is kept. However when there are duplicate names, the function replaces those values with the index of the row numbers. That's why you get an NA for the first row, but when it tries to add the next row, it tried NA again, but sees that's a duplicate so it has to choose a new name. (See what happens when you try df[1:2,]$Number <- 123456 then df[3,]$Number <- 456789)

    On the other hand, when you do

    df[1,1] <- 123456
    

    That doesn't do the subsetting first to create a row with missing row names. you go right to assignment skipping $.data.frame<- and [.data.frame. In this case, it doesn't have to merge in a new row with an NA row name, it can create the row right away and assign a row name. This is just a special property of calling the assignment operator with having to do the extraction first. You can put the debugger on with debug(`[<-.data.frame`) to see exactly how that happens.

    So the first method is basically doing three steps: 1) extact df[1,], 2) change the value of the number column, then 3) merge that new value back into df[1,]. The second method skips the first to steps and is just directly merging values into df[1,]. And the real difference is just how each of those functions choose row names for rows that don't exist yet.

    0 讨论(0)
提交回复
热议问题