Existence of objects created in C functions

问题

It has been established (see below) placement new is required to create objects

int* p = (int*)malloc(sizeof(int));
*p = 42;  // illegal, there isn't an int

Yet that is a pretty standard way of creating objects in C.

The question is, does the int exist if it is created in C, and returned to C++?

In other words, is the following guaranteed to be legal? Assume int is the same for C and C++.

foo.h

#ifdef __cplusplus
extern "C" {
#endif

int* foo(void);

#ifdef __cplusplus
}
#endif

foo.c

#include "foo.h"
#include <stdlib.h>

int* foo(void) {
    return malloc(sizeof(int));
}

main.cpp

#include "foo.h"
#include<cstdlib>

int main() {
    int* p = foo();
    *p = 42;
    std::free(p);
}

Links to discussions on mandatory nature of placement new:

Is placement new legally required for putting an int into a char array?
https://stackoverflow.com/a/46841038/4832499
https://groups.google.com/a/isocpp.org/forum/#!msg/std-discussion/rt2ivJnc4hg/Lr541AYgCQAJ
https://www.reddit.com/r/cpp/comments/5fk3wn/undefined_behavior_with_reinterpret_cast/dal28n0/
reinterpret_cast creating a trivially default-constructible object

回答1:

Yes! But only because int is a fundamental type. Its initialization is vacuous operation:

[dcl.init]/7:

To default-initialize an object of type T means:

If T is a (possibly cv-qualified) class type, constructors are considered. The applicable constructors are enumerated ([over.match.ctor]), and the best one for the initializer () is chosen through overload resolution. The constructor thus selected is called, with an empty argument list, to initialize the object.

If T is an array type, each element is default-initialized.

Otherwise, no initialization is performed.

Emphasis mine. Since "not initializing" an int is akin to default initialing it, it's lifetime begins once storage is allocated:

[basic.life]/1:

The lifetime of an object or reference is a runtime property of the object or reference. An object is said to have non-vacuous initialization if it is of a class or aggregate type and it or one of its subobjects is initialized by a constructor other than a trivial default constructor. The lifetime of an object of type T begins when:

storage with the proper alignment and size for type T is obtained, and

if the object has non-vacuous initialization, its initialization is complete,

Allocation of storage can be done in any way acceptable by the C++ standard. Yes, even just calling malloc. Compiling C code with a C++ compiler would be a very bad idea otherwise. And yet, the C++ FAQ has been suggesting it for years.

In addition, since the C++ standard defers to the C standard where malloc is concerned. I think that wording should be brought forth as well. And here it is:

7.22.3.4 The malloc function - Paragraph 2:

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

The "value is indeterminate" part kinda indicates there's an object there. Otherwise, how could it have any value, let alone an indeterminate one?

回答2:

I think the question is badly posed. In C++ we only have the concepts of translation units and linkage, the latter simply meaning under which circumstances names declared in different TUs refer to the same entity or not.

Nothing is virtually said about the linking process as such, the correctness of which must be guaranteed by the compiler/linker anyway; even if the code snippets above were purely C++ sources (with malloc replaced with a nice new int) the result would be still implementation defined ( eg. consider object files compiled with incompatible compiler options/ABIs/runtimes ).

So, either we talk in full generality and conclude that any program made of more than one TU is potentially wrong or we must take for granted that the linking process is 'valid' ( only the implementation knows ) and hence take for granted that if a function from some source language ( in this case C ) primises to return a 'pointer to an existing int' then the the same function in the destination language (C++) must still be a 'pointer to an existing int' (otherwise, following [dcl.link], we could't say that the linkage has been 'achieved', returning to the no man's land).

So, in my opinion, the real problem is assessing what an 'existing' int is in C and C++, comparatively. As I read the correponding standards, in both languages an int lifetime basically begins when its storage is reserved for it: in the OP case of an allocated(in C)/dynamic(in c++) storage duration object, this occurs (on C side) when the effective type of the lvalue *pointer_to_int becomes int (eg. when it's assigned a value; until then, the not-yet-an-int may trap(*)).

This does not happen in the OP case, the malloc result has no effective type yet. So, that int does not exist neither in C nor in C++, it's just a reinterpreted pointer.

That said, the c++ part of the OP code assigns just after returning from foo(); if this was intended, then we could say that given that malloc() in C++ is required having C semantics, a placement new on the c++ side would suffice to make it valid (as the provided links show).

So, summarizing, either the C code should be fixed to return a pointer to an existing int (by assigning to it) or the c++ code should be fixed by adding placement new. (sorry for the lengthy arguing ... :))

(*) here I'm not claiming that the only issue is the existence of trap representation; if it were, one could argue that the result of foo() is an indeterminate value on C++ side, hence something that you can safely assign to. Clearly this is not the case because there are also aliasing rules to take into account ...

回答3:

I can identify two parts of this question that should be addressed separately.

Object lifetime

It has been established (see below) placement new is required to create objects

I posit that this area of the standard contains ambiguity, omission, contradiction, and/or gratuitous incompatibility with existing practice, and should therefore be considered broken.

The only people who should be interested in what a broken part of the standard actually says are the people responsible for fixing the breakage. Other people (language users and language implementors alike) should defer to existing practice and common sense. Both of which say that one does not need new to create an int, malloc is enough.

This document identifies the problem and proposes a fix (thanks @T.C. for the link)

C compatibility

Assume int is the same for C and C++

It is not enough to assume that.

One also needs to assume that int* is the same, that the same memory is accessible by C and C++ functions linked together in a program, and that the C++ implementation does not define the semantics of calls to functions written in the C programming language to be wiping your hard drive and stealing your girlfriend. In other words, that C and C++ implementations are compatible enough.

None of this is stipulated by the standard or should be assumed. Indeed, there are C implementations that are incompatible with each other, so they cannot be both compatible with the same C++ implementation.

The only thing the standard says is "Every implementation shall provide for linkage to functions written in the C programming language" (dcl.link) What is the semantics of such linkage is left undefined.

Here, as before, the best course of action is to defer to existing practice and common sense. Both of which say that a C++ implementation usually comes bundled with a compatible enough C implementation, with the linkage working as one would expect.

回答4:

The question is meaningless. Sorry. This is the only "lawyer" answer possible.

It is meaningless because the C++ and the C language ignore each others, as they ignore anything else.

Nothing in either language is described in term of low level implementation (which is ridiculous for languages often described as "high level assembly"). Both C and C++ are specified (if you can call that a specification) at a very abstract level, and the high and low levels are never reconnected. This generates endless debates about what undefined behaviors means in practice, how unions work, etc.

回答5:

Although neither the C Standard nor, so far as I know, the C++ Standard officially recognizes the concept, almost any platform which allows programs produce by different compilers to be linked together will support opaque functions.

When processing a call to an opaque function, a compiler will start by ensuring that the value of all objects that might legitimately be examined by outside code is written to the storage associated with those objects. Once that is done, it will place the function's arguments in places specified by the platform's documentation (the ABI, or Application Binary Interface) and perform the call.

Once the function returns, the compiler will assume that any objects which an outside function could have written, may have been written, and will thus reload any such values from the storage associated with those objects the next time they are used.

If the storage associated with an object holds a particular bit pattern when an opaque function returns, and if the object would hold that bit pattern when it has a defined value, then a compiler must behave as though the object has that defined value without regard for how it came to hold that bit pattern.

The concept of opaque functions is very useful, and I see no reason that the C and C++ Standards shouldn't recognize it, nor provide a standard "do nothing" opaque function. To be sure, needlessly calling opaque functions will greatly impede what might otherwise be useful optimizations, but being able to force a compiler to treat actions as opaque function calls when needed may make it possible to enable more optimizations elsewhere.

Unfortunately, things seem to be going in the opposite direction, with build systems increasingly trying to apply "whole program" optimization. WPO would be good if there were a way of distinguishing between function calls that were opaque because the full "optimization barrier" was needed, from those which had been treated as opaque simply because there was no way for optimizers to "see" across inter-module boundaries. Unless or until proper barriers are added, I don't know any way to ensure that optimizers won't get "clever" in ways that break code which would have had defined behavior with the barriers in place.

回答6:

No, the int does not exist, as explained in the linked Q/As. An important standard quote reads like this in C++14:

1.8 The C ++ object model [intro.object]
[...] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed. [...]

(12.2 is a paragraph about temporary objects)

The C++ standard has no rules for interfacing C and C++ code. A C++ compiler can only analyze objects created by C++ code, but not some bits passed to it form an external source like a C program, or a network interface, etc.

Many rules are tailored to make optimizations possible. Some of them are only possible if the compiler does not have to assume uninitialized memory contains valid objects. For example, the rule that one may not read an uninitialized int would not make sense otherwise, because if ints may exist anywhere, why would it be illegal to read an indeterminate int value?

This would be a standard compliant way to write the program:

int main() {
    void* p = foo();
    int i = 42;
    memcpy(p, &i, sizeof(int));
    //std::free(p);   //this works only if C and C++ use the same heap.
}

来源：https://stackoverflow.com/questions/46909105/existence-of-objects-created-in-c-functions

标签

c++

language-lawyer

object-lifetime