Construct object with itself as reference?

风流意气都作罢 提交于 2019-12-20 09:55:22

问题


I just realised that this program compiles and runs (gcc version 4.4.5 / Ubuntu):

#include <iostream>
using namespace std;

class Test
{
public:
  // copyconstructor
  Test(const Test& other);
};

Test::Test(const Test& other)
{
  if (this == &other)
    cout << "copying myself" << endl;
  else
    cout << "copying something else" << endl;
}

int main(int argv, char** argc)
{
  Test a(a);              // compiles, runs and prints "copying myself"
  Test *b = new Test(*b); // compiles, runs and prints "copying something else"
}

I wonder why on earth this even compiles. I assume that (just as in Java) arguments are evaluated before the method / constructor is called, so I suspect that this case must be covered by some "special case" in the language specification?

Questions:

  1. Could someone explain this (preferably by referring to the specification)?
  2. What is the rationale for allowing this?
  3. Is it standard C++ or is it gcc-specific?

EDIT 1: I just realised that I can even write int i = i;

EDIT 2: Even with -Wall and -pedantic the compiler doesn't complain about Test a(a);.

EDIT 3: If I add a method

Test method(Test& t)
{
  cout << "in some" << endl;
  return t;
}

I can even do Test a(method(a)); without any warnings.


回答1:


The reason this "is allowed" is because the rules say an identifiers scope starts immediately after the identifier. In the case

int i = i;

the RHS i is "after" the LHS i so i is in scope. This is not always bad:

void *p = (void*)&p; // p contains its own address

because a variable can be addressed without its value being used. In the case of the OP's copy constructor no error can be given easily, since binding a reference to a variable does not require the variable to be initialised: it is equivalent to taking the address of a variable. A legitimate constructor could be:

struct List { List *next; List(List &n) { next = &n; } };

where you see the argument is merely addressed, its value isn't used. In this case a self-reference could actually make sense: the tail of a list is given by a self-reference. Indeed, if you change the type of "next" to a reference, there's little choice since you can't easily use NULL as you might for a pointer.

As usual, the question is backwards. The question is not why an initialisation of a variable can refer to itself, the question is why it can't refer forward. [In Felix, this is possible]. In particular, for types as opposed to variables, the lack of ability to forward reference is extremely broken, since it prevents recursive types being defined other than by using incomplete types, which is enough in C, but not in C++ due to the existence of templates.




回答2:


I have no idea how this relates to the specification, but this is how I see it:

When you do Test a(a); it allocates space for a on the stack. Therefore the location of a in memory is known to the compiler at the start of main. When the constructor is called (the memory is of course allocated before that), the correct this pointer is passed to it because it's known.

When you do Test *b = new Test(*b);, you need to think of it as two steps. First the object is allocated and constructed, and then the pointer to it is assigned to b. The reason you get the message you get is that you're essentially passing in an uninitialized pointer to the constructor, and the comparing it with the actual this pointer of the object (which will eventually get assigned to b, but not before the constructor exits).




回答3:


The second one where you use new is actually easier to understand; what you're invoking there is exactly the same as:

Test *b;
b = new Test(*b);

and you're actually performing an invalid dereference. Try to add a << &other << to your cout lines in the constructor, and make that

Test *b = (Test *)0xFOOD1E44BADD1E5;

to see that you're passing through whatever value a pointer on the stack has been given. If not explicitly initialized, that's undefined. But even if you don't initialize it with some sort of (in)sane default, it'll be different from the return value of new, as you found out.

For the first, think of it as an in-place new. Test a is a local variable not a pointer, it lives on the stack and therefore its memory location is always well defined - this is very much unlike a pointer, Test *b which, unless explicitly initialized to some valid location, will be dangling.

If you write your first instantiation like:

Test a(*(&a));

it becomes clearer what you're invoking there.

I don't know a way to make the compiler disallow (or even warn) about this sort of self-initialization-from-nowhere through the copy constructor.




回答4:


The first case is (perhaps) covered by 3.8/6:

before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any lvalue which refers to the original object may be used but only in limited ways. Such an lvalue refers to allocated storage (3.7.3.2), and using the properties of the lvalue which do not depend on its value is well-defined.

Since all you're using of a (and other, which is bound to a) before the start of its lifetime is the address, I think you're good: read the rest of that paragraph for the detailed rules.

Beware though that 8.3.2/4 says, "A reference shall be initialized to refer to a valid object or function." There is some question (as a defect report on the standard) what "valid" means in this context, so possibly you can't bind the parameter other to the unconstructed (and hence, "invalid"?) a.

So, I'm uncertain what the standard actually says here - I can use an lvalue, but not bind it to a reference, perhaps, in which case a isn't good, while passing a pointer to a would be OK as long as it's only used in the ways permitted by 3.8/5.

In the case of b, you're using the value before it's initialized (because you dereference it, and also because even if you got that far, &other would be the value of b). This clearly is not good.

As ever in C++, it compiles because it's not a breach of language constraints, and the standard doesn't explicitly require a diagnostic. Imagine the contortions the spec would have to go through in order to mandate a diagnostic when an object is invalidly used in its own initialization, and imagine the data flow analysis that a compiler might have to do to identify complex cases (it may not even be possible at compile time, if the pointer is smuggled through an externally-defined function). Easier to leave it as undefined behavior, unless anyone has any really good suggestions for new spec language ;-)




回答5:


If you crank your warning levels up, your compiler will probably warn you about using uninitialized stuff. UB doesn't require a diagnostic, many things that are "obviously" wrong may compile.




回答6:


I don't know the spec reference, but I do know that accessing an uninitialized pointer always results in undefined behaviour.

When I compile your code in Visual C++ I get:

test.cpp(20): warning C4700: uninitialized local variable 'b' used



来源:https://stackoverflow.com/questions/4368361/construct-object-with-itself-as-reference

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!