Why can a T* be passed in register, but a unique_ptr cannot?

问题

I'm watching Chandler Carruth's talk in CppCon 2019:

There are no Zero-Cost Abstractions

in it, he gives the example of how he was surprised by just how much overhead you incur by using an std::unique_ptr<int> over an int*; that segment starts about at time point 17:25.

You can have a look at the compilation results of his example pair-of-snippets (godbolt.org) - to witness that, indeed, it seems the compiler is not willing to pass the unique_ptr value - which in fact in the bottom line is just an address - inside a register, only in straight memory.

One of the points Mr. Carruth makes at around 27:00 is that the C++ ABI requires by-value parameters (some but not all; perhaps - non-primitive types? non-trivially-constructible types?) to be passed in-memory rather than within a register.

My questions:

Is this actually an ABI requirement on some platforms? (which?) Or maybe it's just some pessimization in certain scenarios?
Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
Has the C++ standards committee discussed this point in recent years, or ever?

PS - So as not to leave this question with no code:

Plain pointer:

void bar(int* ptr) noexcept;
void baz(int* ptr) noexcept;

void foo(int* ptr) noexcept {
    if (*ptr > 42) {
        bar(ptr); 
        *ptr = 42; 
    }
    baz(ptr);
}

Unique pointer:

using std::unique_ptr;
void bar(int* ptr) noexcept;
void baz(unique_ptr<int> ptr) noexcept;

void foo(unique_ptr<int> ptr) noexcept {
    if (*ptr > 42) { 
        bar(ptr.get());
        *ptr = 42; 
    }
    baz(std::move(ptr));
}

回答1:

Is this actually an ABI requirement, or maybe it's just some pessimization in certain scenarios?

One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux:

If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer that has class INTEGER).

An object with either a non-trivial copy constructor or a non-trivial destructor cannot be passed by value because such objects must have well defined addresses. Similar issues apply when returning an object from a function.

Note, that only 2 general purpose registers can be used for passing 1 object with a trivial copy constructor and a trivial destructor, i.e. only values of objects with sizeof no greater than 16 can be passed in registers. See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.

There are different ABIs for other CPU architectures.

Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?

It is an implementation detail, but when an exception is handled, during stack unwinding, the objects with automatic storage duration being destroyed must be addressable relative to the function stack frame because the registers have been clobbered by that time. Stack unwinding code needs objects' addresses to invoke their destructors but objects in registers do not have an address.

Pedantically, destructors operate on objects:

An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction.

and an object cannot exist in C++ if no addressable storage is allocated for it because object's identity is its address.

When an address of an object with a trivial copy constructor kept in registers is needed the compiler can just store the object into memory and obtain the address. If the copy constructor is non-trivial, on the other hand, the compiler cannot just store it into memory, it rather needs to call the copy constructor which takes a reference and hence requires the address of the object in the registers. The calling convention probably cannot depend whether the copy constructor was inlined in the callee or not.

Another way to think about this, is that for trivially copyable types the compiler transfers the value of an object in registers, from which an object can be recovered by plain memory stores if necessary. E.g.:

void f(long*);
void g(long a) { f(&a); }

on x86_64 with System V ABI compiles into:

g(long):                             // Argument a is in rdi.
        push    rax                  // Align stack, faster sub rsp, 8.
        mov     qword ptr [rsp], rdi // Store the value of a in rdi into the stack to create an object.
        mov     rdi, rsp             // Load the address of the object on the stack into rdi.
        call    f(long*)             // Call f with the address in rdi.
        pop     rax                  // Faster add rsp, 8.
        ret                          // The destructor of the stack object is trivial, no code to emit.

In his thought-provoking talk Chandler Carruth mentions that a breaking ABI change may be necessary (among other things) to implement the destructive move that could improve things. IMO, the ABI change could be non-breaking if the functions using the new ABI explicitly opt-in to have a new different linkage, e.g. declare them in extern "C++20" {} block (possibly, in a new inline namespace for migrating existing APIs). So that only the code compiled against the new function declarations with the new linkage can use the new ABI.

Note that ABI doesn't apply when the called function has been inlined. As well as with link-time code generation the compiler can inline functions defined in other translation units or use custom calling conventions.

回答2:

With common ABIs, non-trivial destructor -> can't pass in registers

_{(An illustration of a point in @MaximEgorushkin's answer using @harold's example in a comment; corrected as per @Yakk's comment.)}

If you compile:

struct Foo { int bar; };
Foo test(Foo byval) { return byval; }

you get:

test(Foo):
        mov     eax, edi
        ret

i.e. the Foo object is passed to test in a register (edi) and also returned in a register (eax).

When the destructor is not trivial (like the std::unique_ptr example of OP's) - Common ABIs require placement on the stack. This is true even if the destructor does not use the object's address at all.

Thus even in the extreme case of a do-nothing destructor, if you compile:

struct Foo2 {
    int bar;
    ~Foo2() {  }
};

Foo2 test(Foo2 byval) { return byval; }

you get:

test(Foo2):
        mov     edx, DWORD PTR [rsi]
        mov     rax, rdi
        mov     DWORD PTR [rdi], edx
        ret

with useless loading and storing.

回答3:

Is this actually an ABI requirement on some platforms? (which?) Or maybe it's just some pessimization in certain scenarios?

If something is visible at the compliation unit boundry then whether it is defined implicitly or explicitly it becomes part of the ABI.

Why is the ABI like that?

The fundamental problem is that registers get saved and restored all the time as you move down and up the call stack. So it's not practical to have a reference or pointer to them.

In-lining and the optimizations that result from it is nice when it happens, but an ABI designer can't rely on it happening. They have to design the ABI assuming the worst case. I don't think programmers would be very happy with a compiler where the ABI changed depending on the optimization level.

A trivially copyable type can be passed in registers because the logical copy operation can be split into two parts. The parameters are copied to the registers used for passing parameters by the caller and then copied to the local variable by the callee. Whether the local variable has a memory location or not is thus only the concern of the callee.

A type where a copy or move constructor must be used on the other hand cannot have it's copy operation split up in this way, so it must be passed in memory.

Has the C++ standards committee discussed this point in recent years, or ever?

I have no idea if the standards bodies have considered this.

The obvious solution to me would be to add proper destructive moves (rather than the current half-way house of a "valid but otherwise unspecified state") to the langauge, then introduce a way to flag a type as allowing for "trivial destructive moves" even if it does not allow for trivial copies.

but such a solution WOULD require breaking the ABI of existing code to implement for existing types, which may bring a fair bit of resistance (though ABI breaks as a result of new C++ standard versions are not unprecedented, for example the std::string changes in C++11 resulted in an ABI break..

回答4:

An object must absolutely by passed by address if it must appear to have an address in both separately compiled functions:

void callee(int &i) {
   something(&i);
}

void caller() {
   int i;
   callee(i);
   something(&i);
}

Here even if something(address) is a pure function or macro or whatever (like printf("%p",arg)) that can't store the address or communicate to another entity, we have the requirement to pass by address because the address must be well defined for a unique object int that has an unique identity.

How do other function calls look like this reference pattern?

Class objects must be constructed by the caller; a constructor formally has a this pointer but formalism isn't relevant here: all objects formally have an address but only those that actually get their address used, on an address-y way (not *&i = 1;) need to have a well defined address.

Here the potential for a real use of the address in a either a non trivial constructor or destructor on the caller side is probably the reason for taking the safe, simplistic route and give the object an identity in the caller and pass its address, as it makes sure that any non trivial use of its address in the constructor, after construction and in the destructor is consistent: this must appear to be the same over the object existence.

A non trivial constructor or destructor like any other function can use the this pointer in a way that requires consistency over its value even though some object with non trivial stuff might not:

struct file_handler { // don't use that class!
    file_handler () { this->fileno = -1; }
    file_handler (int f) { this->fileno = f; }
    file_handler (const file_handler& rhs) {
        if (this->fileno != -1)
            this->fileno = dup(rhs.fileno);
        else
            this->fileno = -1;
    }
    ~file_handler () {
        if (this->fileno != -1)
            close(this->fileno); 
    }
    file_handler &operator= (const file_handler& rhs);
};

Note that in that case, despite explicit use of a pointer (explicit syntax this->), the object identity is irrelevant: the compiler could well use bitwise copy the object around to move it and to do "copy elision". This is based on the level of "purity" of the use of this in special member functions (address doesn't escape).

But purity isn't an attribute available at the standard declaration level (compiler extensions exist that add purity description on non inline function declaration), so you can't define an ABI based on purity of code that may not be available (code may or may not be inline and available for analysis).

Purity is measured as "certainly pure" or "impure or unknown". The common ground, or upper bound of semantics (actually maximum), or LCM (Least Common Multiple) is "unknown". So the ABI settles on unknown.

Summary:

Some constructs require the compiler to define the object identity.
The ABI is defined in term of classes of programs and not specific cases that might be optimized.

Possible future work:

Is purity annotation useful enough to be generalized and standardized?

来源：https://stackoverflow.com/questions/58339165/why-can-a-t-be-passed-in-register-but-a-unique-ptrt-cannot

标签

c++