Why does C++ allow private members to be modified using this approach?

问题

After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example

 class TestClass {
   private:
    int cc;
   public:
     TestClass(int i) : cc(i) {};
 };

 TestClass cc(5);
 int* pp = (int*)&cc;
 *pp = 70;             // private member has been modified

I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.

回答1:

Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.

In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.

When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.

Edit: as to the question @Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.

Let's step back to the beginning and consider a struct like:

struct X {
    int a;
    int b;
};

C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of a, so you can cast a pointer to the struct to a pointer to int, and access a with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).

For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce private (and protected) at run-time, it should be easy to do that (reasonably efficiently).

Therefore, given something like:

struct Y { 
    int a;
    int b;
private:
    int c;
    int d;
public:
    int e;

    // code to use `c` and `d` goes here.
};

The compiler should be required to maintain the same rules as C with respect to Y.a and Y.b. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:

struct Z { 
    int a;
    int b;
    int e;
private:
    int c;
    int d;
    // code to use `c` and `d` goes here.
};

Then, when it's enforcing things at run-time, it can basically do something like if (offset > 3 * sizeof(int)) access_violation();

To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.

To enforce both of those, the C++98 said Y::a and Y::b had to be in that order in memory, and Y::a had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers, Y::c and Y::e no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).

That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:

struct A { 
    int a;
public:
    int b;
public:
    int c;
public:
    int d;
};

...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.

To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.

回答2:

Because of backwards-compatability with C, where you can do the same thing.

For all people wondering, here's why this is not UB and is actually allowed by the standard:

First, TestClass is a standard-layout class (§9 [class] p7):

A standard-layout class is a class that:

has no non-static data members of type non-standard-layout class (or array of such types) or reference, // OK: non-static data member is of type 'int'

has no virtual functions (10.3) and no virtual base classes (10.1), // OK

has the same access control (Clause 11) for all non-static data members, // OK, all non-static data members (1) are 'private'

has no non-standard-layout base classes, // OK, no base classes

either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and // OK, no base classes again

has no base classes of the same type as the first non-static data member. // OK, no base classes again

And with that, you can are allowed to reinterpret_cast the class to the type of its first member (§9.2 [class.mem] p20):

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.

In your case, the C-style (int*) cast resolves to a reinterpret_cast (§5.4 [expr.cast] p4).

回答3:

A good reason is to allow compatibility with C but extra access safety on the C++ layer.

Consider:

struct S {
#ifdef __cplusplus
private:
#endif // __cplusplus
    int i, j;
#ifdef __cplusplus
public:
    int get_i() const { return i; }
    int get_j() const { return j; }
#endif // __cplusplus
};

By requiring that the C-visible S and the C++-visible S be layout-compatible, S can be used across the language boundary with the C++ side having greater access safety. The reinterpret_cast access safety subversion is an unfortunate but necessary corollary.

As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.

回答4:

The whole purpose of reinterpret_cast (and a C style cast is even more powerful than a reinterpret_cast) is to provide an escape path around safety measures.

回答5:

The compiler would have given you an error if you had tried int *pp = &cc.cc, the compiler would have told you that you cannot access a private member.

In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been int* pp = reinterpret_cast<int*>(&cc);. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.

When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.

回答6:

This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.

来源：https://stackoverflow.com/questions/12093319/why-does-c-allow-private-members-to-be-modified-using-this-approach

标签

c++

private-members