Why does MISRA C state that a copy of pointers can cause a memory exception?

社会主义新天地 提交于 2019-11-28 08:01:52

According to the Standard, copying the pointer q = p;, is undefined behaviour.

Reading J.2 Undefined behaviour states:

The value of a pointer to an object whose lifetime has ended is used (6.2.4).

Going to that chapter we see that:

6.2.4 Storage durations of objects

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address,33)and retains its last-stored value throughout its lifetime.34)If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

What is indeterminate:

3.19.2 indeterminate value: either an unspecified value or a trap representation

giorgi moniava

Once you free an object through the pointer, all pointers to that memory become indeterminate. (Even) reading indeterminate memory is undefined behaviour (UB). Following is UB:

char *p = malloc(5);
free(p);
if(p == NULL) // UB: even just reading value of p as here, is UB
{

}

First, some history...

When ISO/IEC JTC1/SC22/WG14 first started to formalise the C Language (to produce what is now ISO/IEC 9899:2011) they had a problem.

Many compiler vendors had interpreted things in different ways.

Early on, they made a decision to not break any existing functionality... so where compiler implementations were divergent, the Standard offers unspecified and undefined behaviours.

MISRA C attempts to trap the pit-falls that these behaviours will trigger. So much for the theory...

--

Now to the specific of this question:

Given that the point of free() is to release the dynamic memory back to the heap, there were three possible implementations, all of which were "in the wild":

  1. reset the pointer to NULL
  2. leave the pointer as was
  3. destroy the pointer

The Standard could not mandate any one of these, so formally leaves the behaviour as undefined - your implementation may follow one path, but a different compiler could do something else... you cannot assume, and it is dangerous to rely on a method.

Personally, I'd rather the Standard was specific, and required free() to set the pointer to NULL, but that's just my opinion.

--

So the TL;DR; answer is, unfortunately: because it is!

While both p and q are both pointer variables on the stack, the memory address returned by malloc() is not on the stack.

Once a memory area that was successfully malloced is freed then at that point there is no telling who may be using the memory area or the disposition of the memory area.

So once free() is used to free an area of memory previously obtained using malloc() an attempt to use the memory area is an undefined type of action. You might get lucky and it will work. You might be unlucky and it will not. Once you free() a memory area, you no longer own it, something else does.

The issue here would appear to be what machine code is involved in copying a value from one memory location to another. Remember that MISRA targets embedded software development so the question is always what kind of funky processors are out there that do something special with a copy.

The MISRA standards are all about robustness, reliability, and eliminating risk of software failure. They are quite picky.

The value of p cannot be used as such after the memory it points to has been freed. More generally, the value of an uninitialized pointer has the same status: even just reading it for the purpose of copying to invokes undefined behavior.

The reason for this surprising restriction is the possibility of trap representations. Freeing the memory pointed to by p can make its value become a trap representation.

I remember one such target, back in the early 1990s that behaved this way. Not en embedded target then and rather in widespread use then: Windows 2.x. It used the Intel architecture in 16-bit protected mode, where pointers were 32-bit wide, with a 16-bit selector and a 16-bit offset. In order to access the memory, pointers were loaded in a pair of registers (a segment register and an address register) with a specific instruction:

    LES  BX,[BP+4]   ; load pointer into ES:BX

Loading the selector part of the pointer value into a segment register had the side effect of validating the selector value: if the selector did not point to a valid memory segment, an exception would be fired.

Compiling the innocent looking statement q = p; could be compiled in many different ways:

    MOV  AX,[BP+4]    ; loading via DX:AX registers: no side effects
    MOV  DX,[BP+6]
    MOV  [BP-6],AX
    MOV  [BP-4],DX

or

    LES  BX,[BP+4]    ; loading via ES:BX registers: side effects
    MOV  [BP-6],BX
    MOV  [BP-4],ES

The second option has 2 advantages:

  • The code is more compact, 1 less instruction

  • The pointer value is loaded into registers that can be used directly to dereference the memory, which can result in fewer instructions generated for subsequent statements.

Freeing the memory may unmap the segment and make the selector invalid. The value becomes a trap value and loading it into ES:BX fires an exception, also called trap on some architectures.

Not all compilers would use the LES instruction for just copying pointer values because it was slower, but some did when instructed to generate compact code, a common choice then as memory was rather expensive and scarce.

The C Standard allows for this and describes a form of undefined behavior the code where:

The value of a pointer to an object whose lifetime has ended is used (6.2.4).

because this value has become indeterminate as defined this way:

3.19.2 indeterminate value: either an unspecified value or a trap representation

Note however that you can still manipulate the value by aliasing via a character type:

/* dumping the value of the free'd pointer */
unsigned char *pc = (unsigned char*)&p;
size_t i;
for (i = 0; i < sizeof(p); i++)
    printf("%02X", pc[i]);   /* no problem here */

/* copying the value of the free'd pointer */
memcpy(&q, &p, sizeof(p));   /* no problem either */

There are two reasons that code which examines a pointer after freeing it is problematic even if the pointer is never dereferenced:

  1. The authors of the C Standard did not wish to interfere with implementations of the language on platforms where pointers contain information about the surrounding memory blocks, and which might validate such pointers whenever anything is done with them, whether they are dereferenced or not. If such platforms exist, code which uses pointers in violation of the Standard might not work with them.

  2. Some compilers operate on the presumption that a program will never receive any combination of inputs that would invoke UB, and thus any combination of inputs that would produce UB should be presumed impossible. As a consequence of this, even forms of UB which would have no detrimental effect on the target platform if a compiler simply ignored them may end up having arbitrary and unlimited side-effects.

IMHO, there is no reason why equality, relational, or pointer-difference operators upon freed pointers should have any adverse effect on any modern system, but because it is fashionable for compilers to apply crazy "optimizations", useful constructs which should be usable on commonplace platforms have become dangerous.

The poor wording in the sample code is throwing you off.

It says "value of p is indeterminate", but it is not the value of p that is indeterminate, because p still has the same value (the address of a memory block which has been released).

Calling free(p) does not change p -- p is only changed once you leave the scope in which p is defined.

Instead, it is the value of what p points to that is indeterminate, since the memory block has been released, and it may as well be unmapped by the operating system. Accessing it either through p or through an aliased pointer (q) may cause an access violation.

An important concept to internalize is the meaning of "indeterminate" or "undefined" behavior. It is exactly that: unknown and unknowable. We would often tell students "It is perfectly legitimate for your computer to melt into a shapeless blob, or for the disk to fly off to Mars". As I read the original documentation included, I did not see any place it said to not use malloc. It merely points out that an erroneous program will fail. Actually, having the program take a memory exception is a Good Thing, because it tells you immediately that your program is defective. Why the document suggests this might be a Bad Thing escapes me. What is a Bad Thing is that on most architectures, it will NOT take a memory exception. Continuing to use that pointer will produce erroneous values, potentially render the heap unusable, and, if that same block of storage is allocated for a different use, corrupting the valid data of that use, or interpreting its values as your own. Bottom line: don't use 'stale' pointers! Or, to put it another way, writing defective code means that it won't work.

Furthermore, the act of assigning p to q is most decidedly NOT "undefined". The bits stored in the variable p, which are meaningless nonsense, are quite easily, and correctly, copied to q. All this means now is that any value that is accessed by p can now also be accessed by q, and since p is undefined nonsense, q is now undefined nonsense. So using either one of them to read or write will produce "undefined" results. If you are lucky enough to be running on an architecture that can cause this to take a memory fault, you will easily detect the improper usage. Otherwise, using either pointer means your program is defective. Plan on spending a lot of hours finding it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!