Why does 'out of bounds' indexing differ between a matrix and a data.frame?

谁都会走 提交于 2020-05-25 11:26:13

问题


I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here.

If I subset a matrix by index out of bounds, I get exactly that error:

m <- matrix(data = c("foo", "bar"), nrow = 1)
m[2,]
# Error in m[2, ] : subscript out of bounds

If I do the same do a data frame, however, I get all NA rows:

df <- data.frame(foo = "foo", bar = "bar")
df[2,]
#    foo  bar
# NA <NA> <NA>

If I subset into a non-existent data frame column I get the familiar

df[, 3]
# Error in `[.data.frame`(df, , 3) : undefined columns selected

I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior.

Can someone explain why R behaves in this way for non-existent df rows?

Update

To be sure, giving NA on out-of-bounds subsets, is normal R behavior for 1D vectors:

vec <- c("foo", "bar")
vec[3]
# [1] NA

So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out. Still the different 2D subsetting behavior (m[2, ] vs df[2, ]) might strike a dense user (as I am right now) as inconsistent.


回答1:


Can someone explain why R behaves in this way[?]

Short answer: No, probably not.


Longer answer: Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of [[. Basically it boils down to:

The semantics of [ and [[ don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them

Duncan Murdoch, a former member of the R core team gives a very nice reply:

There is more documentation in the man page for Extract, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental

As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors:

If i is positive and exceeds length(x) then the corresponding selection is NA

But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again:

So what is a simple vector? That is not explicitly defined, and it probably should be.

Thus, it may seem like no one knows the answer to your why question.


See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book.


*Source code for the [ subset operator. A search for R_MSG_subs_o_b (which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB [ indexing of matrices and when using [[ give an error, whereas OOB [ indexing of "simple vectors" results in NA.



来源:https://stackoverflow.com/questions/53448128/why-does-out-of-bounds-indexing-differ-between-a-matrix-and-a-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!