What is the rationale for limitations on pointer arithmetic or comparison?

浪尽此生 提交于 2019-11-27 13:09:10

There are architectures where program and data spaces are separated, and it's simply impossible to subtract two arbitrary pointers. A pointer to a function or to const static data will be in a completely different address space than a normal variable.

Even if you arbitrarily supplied a ranking between different address spaces, there's a possibility that the diff_t type would need to be a larger size. And the process of comparing or subtracting two pointers would be greatly complicated. That's a bad idea in a language that is designed for speed.

The reason is to keep the possibility to generate reasonable code. This applies to systems with a flat memory model as well as to systems with more complex memory models. If you forbid the (not very useful) corner cases like adding or subtracting out of arrays and demanding a total order on pointers between objects you can skip a lot of overhead in the generated code.

The limitations imposed by the standard allows the compiler to make assumptions on pointer arithmetic and use this to improve quality of the code. It covers both computing things statically in the compiler instead of at runtime and choosing which instrutions and addressing modes to use. As an example, consider a program with two pointers p1 and p2. If the compiler can derive that they point to different data objects it can safely assume that any no operation based on following p1 will ever affect the object pointed to by p2. This allows the compiler to reorder loads and stores based on p1 without consider loads and stores based on p2 and the other way around.

You only prove that the restriction could be removed - but miss that it would come with a cost (in terms of memory and code) - which was contrary to the goals of C.

Specifically the difference needs to have a type, which is ptrdiff_t, and one would assume it is similar to size_t.

In a segmented memory model you (normally) indirectly have a limitation on the sizes of objects - assuming that the answers in: What's the real size of `size_t`, `uintptr_t`, `intptr_t` and `ptrdiff_t` type on 16-bit systems using segmented addressing mode? are correct.

Thus at least for differences removing that restriction would not only add extra instructions to ensure a total order - for an unimportant corner case (as in other answer), but also spend double the amount of memory for differences etc.

C was designed to be more minimalistic and not to force compiler to spend memory and code on such cases. (In those days memory limitations mattered more.)

Obviously there are also other benefits - like the possibility to detect errors when mixing pointers from different arrays. Similarly as mixing iterators for two different containers is undefined in C++ (with some minor exceptions) - and some debug-implementations detect such errors.

The rationale is that some architectures have segmented memory, and pointers to different objects may point at different memory segments. The difference between the two pointers would then not necessarily be something meaningful.

This goes back all the way to pre-standard C. The C rationale doesn't mention this explicitly, but it hints at this being the reason, if we look where it explains the rationale why using a negative array index is undefined behavior (C99 rationale 5.10 6.5.6, emphasis mine):

In the case of p-1, on the other hand, an entire object would have to be allocated prior to the array of objects that p traverses, so decrement loops that run off the bottom of an array can fail. This restriction allows segmented architectures, for instance, to place objects at the start of a range of addressable memory.

Since the C standard intends to cover the majority of processor architectures, it should also cover this one: Imagine an architecture (I know one, but wouldn't name it) where pointers are not just plain numbers, but are like structures or "descriptors". Such a structure contains information about the object it points into (its virtual address and size) and the offset within it. Adding or subtracting a pointer produces a new structure with only the offset field adjusted; producing a structure with the offset greater than the size of the object is hardware prohibited. There are other restrictions (such as how the initial descriptor is produced or what are the other ways to modify it), but they are not relevant to the topic.

In most cases where the Stanadrd classifies an action as invoking Undefined Behavior, it has done so because:

  1. There might be platforms where defining the behavior would be expensive. Segmented architectures could behave weirdly if code tries to do pointer arithmetic that extends beyond object boundaries, and some compilers may evaluate p > q by testing the sign of q-p.

  2. There are some kinds of programming where defining the behavior would be useless. Many kinds of code can get by just fine without relying upon forms of pointer addition, subtraction, or relational comparison beyond those given by the Standard.

  3. People writing compilers for various purposes should be capable of recognizing cases where quality compilers intended for such purposes should behave predictably, and handling such cases when appropriate, whether or not the Standard compels them to do so.

Both #1 and #2 are very low bars, and #3 was thought to be a "gimme". Although it has become fashionable for compiler writers to show off their cleverness by finding ways of breaking code whose behavior was defined by quality implementations intended for low-level programming, I don't think the authors of the Standard expected compiler writers to perceive a huge difference between actions which were required to behave predictably, versus those where nearly all quality implementations were expected to behave identically, but where there it might conceivably be useful to let some arcane implementations do something else.

I would like to answer this by inverting the question. Instead of asking why pointer addition and most of the arithmetic operations are not allowed, why do pointers allow only adding or subtracting an integer, post and pre increment and decrement and comparison (or subtraction) of pointers pointing to the same array? It is to do with the logical consequence of the arithmetic operation. Adding/subtracting an integer n to a pointer p gives me the address of nth element from the currently pointed element either in the forward or reverse direction. Similarly, subtracting p1 and p2 pointing to the same array gives me the count of elements between the two pointers. The fact (or design) that the pointer arithmetic operations are defined consistent with the type of variable it is pointing to is a real stroke of genius. Any operation other than the permitted ones defies programming or philosophically logical reasoning and therefore is not allowed.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!