What is the rationale for one past the last element of an array object?

空扰寡人 提交于 2019-11-30 22:29:46

Specifying the range to loop over as the half-closed interval [start, end), especially for array indices, has certain pleasing properties as Dijkstra observed in one of his notes.

1) You can compute the size of the range as a simple function of end - start. In particular, if the range is specified in terms of array indices, the number of iterations performed by the loop would be given by end - start. If the range was [start, end], then the number of iterations would have been end - start + 1 - very annoying, isn't it? :)

2) Dijsktra's second observation applies only to the case of (non-negative) integral indices - specifying a range as [start, end) and (start, end] both have the property mentioned in 1). However, specifying it as (start, end] requires you to allow an index of -1 to represent a loop range including the index 0 - you are allowing an "unnatural" value of -1 just for the sake of representing the range. The [start, end) convention does not have this issue, because end is a non-negative integer, and hence a natural choice when dealing with array indices.

Dijsktra's objection to allowing -1 does have similarities to allowing one past the last valid address of the container. However, since the above convention has been in use for so long, it likely persuaded the standards committee to make this exception.

The rationale is quite simple. The compiler is not allowed to place an array at the end of memory. To illustrate, assume that we have a 16-bit machine with 16-bit pointers. The low address is 0x0000. The high address is 0xffff. If you declare char array[256] and the compiler locates array at address 0xff00, then technically the array would fit into the memory, using addresses 0xff00 thru 0xffff inclusive. However, the expression

char *endptr = &array[256];   // endptr points one past the end of the array

would be equivalent to

char *endptr = NULL;          // &array[256] = 0xff00 + 0x0100 = 0x0000

Which means that the following loop would not work, since ptr will never be less than 0

for ( char *ptr = array; ptr < endptr; ptr++ )

So the sections you cited are simply lawyer-speak for, "Don't put arrays at the end of a memory region".


Historical note: the earliest x86 processors used a segmented memory scheme wherein memory addresses where specified by a 16-bit pointer register and a 16-bit segment register. The final address was computed by shifting the segment register left by 4 bits and adding to the pointer, e.g.

pointer register    1234
segment register   AB00
                   -----
address in memory  AC234

The resulting address space was 1MByte, but there were end-of-memory boundaries every 64Kbytes. That's one reason for using lawyer-speak instead of stating, "Don't put arrays at the end of memory" in plain english.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!