Is it valid to copy a struct some of whose members are not initialized?
I suspect it is undefined behavior, but if so, it makes leaving any uninitialized members in
Yes, if the uninitialized member is not an unsigned narrow character type or std::byte
, then copying a struct containing this indeterminate value with the implicitly defined copy constructor is technically undefined behavior, as it is for copying a variable with indeterminate value of the same type, because of [dcl.init]/12.
This applies here, because the implicitly generated copy constructor is, except for union
s, defined to copy each member individually as if by direct-initialization, see [class.copy.ctor]/4.
This is also subject of the active CWG issue 2264.
I suppose in practice you will not have any problem with that, though.
If you want to be 100% sure, using std::memcpy
always has well-defined behavior if the type is trivially copyable, even if members have indeterminate value.
These issues aside, you should always initialize your class members properly with a specified value at construction anyway, assuming you don't require the class to have a trivial default constructor. You can do so easily using the default member initializer syntax to e.g. value-initialize the members:
struct Data {
int a{}, b{};
};
int main() {
Data data;
data.a = 5;
Data data2 = data;
}
In general, copying uninitialized data is undefined behavior because that data may be in a trapping state. Quoting this page:
If an object representation does not represent any value of the object type, it is known as trap representation. Accessing a trap representation in any way other than reading it through an lvalue expression of character type is undefined behavior.
Signalling NaNs are possible for floating point types, and on some platforms integers may have trap representations.
However, for trivially copyable types it is possible to use memcpy
to copy the raw representation of the object. Doing so is safe since the value of the object is not interpreted, and instead the raw byte sequence of the object representation is copied.
In some cases, such as the one described, the C++ Standard allows compilers to process constructs in whatever fashion their customers would find most useful, without requiring that behavior be predictable. In other words, such constructs invoke "Undefined Behavior". That doesn't imply, however, that such constructs are meant to be "forbidden" since the C++ Standard explicitly waives jurisdiction over what well-formed programs are "allowed" to do. While I'm unaware of any published Rationale document for the C++ Standard, the fact that it describes Undefined Behavior much like C89 does would suggest the intended meaning is similar: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior".
There are many situations where the most efficient way to process something would involve writing the parts of a structure that downstream code is going to care about, while omitting those that downstream code isn't going to care about. Requiring that programs initialize all members of a structure, including those that nothing is ever going to care about, would needlessly impede efficiency.
Further, there are some situations where it may be most efficient to have uninitialized data behave in non-deterministic fashion. For example, given:
struct q { unsigned char dat[256]; } x,y;
void test(unsigned char *arr, int n)
{
q temp;
for (int i=0; i<n; i++)
temp.dat[arr[i]] = i;
x=temp;
y=temp;
}
if downstream code won't care about the values of any elements of x.dat
or y.dat
whose indices weren't listed in arr
, the code might be optimized to:
void test(unsigned char *arr, int n)
{
q temp;
for (int i=0; i<n; i++)
{
int it = arr[i];
x.dat[index] = i;
y.dat[index] = i;
}
}
This improvement in efficiency wouldn't be possible if programmers were required to explicitly write every element of temp.dat
, including those downstream wouldn't care about, before copying it.
On the other hand, there are some applications where it's important to avoid the possibility of data leakage. In such applications, it may be useful to either have a version of the code that's instrumented to trap any attempt to copy uninitialized storage without regard for whether downstream code would look at it, or it might be useful to have an implementation guarantee that any storage whose contents could be leaked would get zeroed or otherwise overwritten with non-confidential data.
From what I can tell, the C++ Standard makes no attempt to say that any of these behaviors is sufficiently more useful than the other as to justify mandating it. Ironically, this lack of specification may be intended to facilitate optimization, but if programmers can't exploit any kind of weak behavioral guarantees, any optimizations will be negated.
Since all members of the Data
are of primitive types, data2
will get exact "bit-by-bit copy" of the all members of data
. So the value of data2.b
will be exactly the same as value of the data.b
. However, exact value of the data.b
cannot be predicted, because you have not initialized it explicitly. It will depend on values of the bytes in the memory region allocated for the data
.