Pointer-to-array overlapping end of array

前端 未结 6 863
花落未央
花落未央 2020-12-31 00:43

Is this code correct?

int arr[2];

int (*ptr)[2] = (int (*)[2]) &arr[1];

ptr[0][0] = 0;

Obviously ptr[0][1] would be inva

相关标签:
6条回答
  • 2020-12-31 01:05

    Yes, this is correct code. Quoting N4140 for C++14:

    [expr.sub]/1 ... The expression E1[E2] is identical (by definition) to *((E1)+(E2))

    [expr.add]/5 ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

    There is no overflow here. &*(*(ptr)) == &ptr[0][0] == &arr[1].

    For C11 (N1570) the rules are the same. §6.5.2.1 and §6.5.6

    0 讨论(0)
  • 2020-12-31 01:10

    For C++ (I'm using draft N4296) [dcl.array]/7 says in particular that if the result of subscripting is an array, it's immediately converted to pointer. That is, in ptr[0][0] ptr[0] is first converted to int* and only then second [0] is applied to it. So it's perfectly valid code.

    For C (C11 draft N1570) 6.5.2.1/3 states the same.

    0 讨论(0)
  • 2020-12-31 01:12

    It depends on what you mean by "correct". You are doing a cast on the ptr to arr[1]. In C++ this will probably be a reinterpret_cast. C and C++ are languages which (most of the time) assume that the programmer knows what he is doing. That this code is buggy has nothing to do with the fact that it is valid C/C++ code.

    You are not violating any rules in the standards (as far as I can see).

    0 讨论(0)
  • 2020-12-31 01:15

    Not an answer but a comment that I can't seem to word well without being a wall of text:

    Given arrays are guaranteed to store their contents contiguously so that they can be 'iterated over' using a pointer. If I can take a pointer to the begin of an array and successively increment that pointer until I have accessed every element of the array then surely that makes a statement that the array can be accessed as a series of whatever type it is composed of.

    Surely the combination of: 1) Array[x] stores its first element at address 'array' 2) Successive increments of the a pointer to it are sufficient to access the next item 3) Array[x-1] obeys the same rules

    Then it should be legal to at least look at the address 'array' as if it were type array[x-1] instead of type array[x].

    Furthermore given the points about being contiguous and how pointers to elements in the array have to behave, surely it must be legal to then group any contiguous subset of array[x] as array[y] where y < x and it's upper bound does not exceed the extent of array[x].

    Not being a language-lawyer this is just me spouting some rubbish. I am very interested in the outcome of this discussion though.

    EDIT:

    On further consideration of the original code, it seems to me that arrays are themselves very much a special case in many regards. They decay to a pointer, and I believe can be aliased as per what I just said earlier in this post.

    So without any standardese to back up my humble opinion, an array can't really be invalid or 'undefined' as a whole if it doesn't really get treated as a whole uniformly.

    What does get treated uniformly are the individual elements. So I think it only makes sense to talk about whether accessing a specific element is valid or defined.

    0 讨论(0)
  • 2020-12-31 01:19

    Trying to answer here why the code works on commonly used compilers:

    int arr[2];
    
    int (*ptr)[2] = (int (*)[2]) &arr[1];
    
    printf("%p\n", (void*)ptr);
    printf("%p\n", (void*)*ptr);
    printf("%p\n", (void*)ptr[0]);
    

    All lines print the same address on commonly used compilers. So, ptr is an object for which *ptr represents the same memory location as ptr on commonly used compilers and therefore ptr[0] is really a pointer to arr[1] and therefore arr[0][0] is arr[1]. So, the code assigns a value to arr[1].

    Now, let's suppose a perverse implementation where a pointer to an array (NOTE: I'm saying pointer to an array, i.e. &arr which has the type int(*)[], not arr which means the same as &arr[0] and has the type int*) is the pointer to the second byte within the array. Then dereferencing ptr is the same as subtracting 1 from ptr using char* arithmetic. For structs and unions, it is guaranteed that pointer to such types is the same as pointer to the first element of such types, but in casting pointer to array into pointer no such guarantee was found for arrays (i.e. that pointer to an array would be the same as pointer to the first element of the array) and as a matter of fact @FUZxxl planned to file a defect report about the standard. For such a perverse implementation, *ptr i.e. ptr[0] would not be the same as &arr[1]. On RISC processors, it would as a matter of fact cause problems due to data alignment.

    Some additional fun:

    int arr[2] = {0, 0};
    int *ptr = (int*)&arr;
    ptr[0] = 5;
    printf("%d\n", arr[0]);
    

    Should that code work? It prints 5.

    Even more fun:

    int arr[2] = {0, 0};
    int (*ptr)[3] = (int(*)[3])&arr;
    ptr[0][0] = 6;
    printf("%d\n", arr[0]);
    

    Should this work? It prints 6.

    This should obviously work:

    int arr[2] = {0, 0};
    int (*ptr)[2] = &arr;
    ptr[0][0] = 7;
    printf("%d\n", arr[0]);
    
    0 讨论(0)
  • 2020-12-31 01:24

    Let me give a dissenting opinion: this is (at least in C++) undefined behaviour, for much the same reason as in the other question that this question linked to.

    First let me clarify the example with some typedefs that will simplify the discussion.

    typedef int two_ints[2];
    typedef int* int_ptr;
    typedef two_ints* two_ints_ptr;
    
    two_ints arr;
    
    two_ints_ptr ptr = (two_ints_ptr) &arr[1];
    
    int_ptr temp = ptr[0]; // the two_ints value ptr[0] gets converted to int_ptr
    temp[0] = 0;
    

    So the question is whether, although there is no object of type two_ints whose address coincides with that of arr[1] (in the same sense that the adress of arr coincides with that of arr[0]), and therefore no object to which ptr[0] could possibly point to, one can nonetheless convert the value of that expression to one of type int_ptr (here given the name temp) that does point to an object (namely the integer object also called arr[1]).

    The point where I think behaviour is undefined is in the evaluation of ptr[0], which is equivalent (per 5.2.1[expr.sub]) to *(ptr+0); more precisely the evaluation of ptr+0 has undefined behaviour.

    I'll cite my copy of the C++ which is not official [N3337], but probably the language has not changed; what bothers me slightly is that the section number does not at all match the one mentioned at the accepted answer of the linked question. Anyway, for me it is §5.7[expr.add]

    If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce overflow; otherwise the behavior is undefined.

    Since the pointer operand ptr has type pointer to two_ints, the "array object" mentioned in the cited text would have to be an array of two_ints objects. However there is only one such object here, the fictive array whose unique element is arr that we are supposed to conjure up in such situations (as per: "pointer to nonarray object behaves the same as a pointer to the first element of an array of length one..."), but clearly ptr does not point to its unique element arr. So even though ptr and ptr+0 are no doubt equal values, neither of them point to elements of any array object at all (not even a fictive one), nor one past the end of such an array object, and the condition of the cited phrase is not met. The consequence is (not that overflow is produced, but) that behavior is undefined.

    So behavior is already undefined before the indirection operator * is applied. I would not argue for undefined behavior from the latter evaluation, even though the phrase "the result is an lvalue referring to the object or function to which the expression points" is hard to interpret for expressions that do not refer to any object at all. But I would be lenient in interpreting this, since I think dereferencing a pointer past an array should not itself be undefined behavior (for instance if used to initialise a reference).

    This would suggest that if instead of ptr[0][0] one wrote (*ptr)[0] or **ptr, then behaviour would not be undefined. This is curious, but it would not be the first time the C++ standard surprises me.

    0 讨论(0)
提交回复
热议问题