Inline member operators vs inline operators C++

问题

If I have two structs:

struct A
{
    float x, y;
    inline A operator*(A b) 
    {
        A out;
        out.x = x * b.x;
        out.y = y * b.y;
        return out;
    } 
}

And an equivalent struct

struct B
{
    float x, y;
}

inline B operator*(B a, B b) 
{
    B out;
    out.x = a.x * b.x;
    out.y = a.y * b.y;
    return out;
}

Would you know of any reason for B's operator* to compile any differently, or run any slower or faster than A's operator* (the actual actions that go on inside the functions should be irrelevant)?

What I mean is... would declaring the inline operator as a member, vs not as a member, have any generic effect on the speed of the actual function, whatsoever?

I've got a number of different structs that currently follow the inline member operator style... But I was wanting to modify it to be valid C code, instead; so before I do that I wanted to know if there would be any changes to performance/compilation.

回答1:

The way you have it written, I'd expect B::operator* to run slightly slower. This is because the "under the hood" implementation of A::operator* is like:

inline A A::operator*(A* this, A b) 
{ 
    A out;
    out.x = this->x * b.x;
    out.y = this->y * b.y;
    return out;
}

So A passes a pointer to its left-hand-side argument to the function, while B has to make a copy of that parameter before calling the function. Both have to make copies of their right-hand-side parameters.

Your code would be much better off, and probably would implement the same for A and B, if you wrote it using references and made it const correct:

struct A
{
    float x, y;
    inline A operator*(const A& b) const 
    {
        A out;
        out.x = x * b.x;
        out.y = y * b.y;
        return out;
    } 
}

struct B
{
    float x, y;
}

inline B operator*(const B& a, const B& b) 
{
    B out;
    out.x = a.x * b.x;
    out.y = a.y * b.y;
    return out;
}

You still want to return objects, not references, since the results are effectively temporaries (you're not returning a modified existing object).

Addendum

However, with the const pass-by-reference for both arguments, in B, would it make it effectively faster than A, due to the dereferencing?

First off, both involve the same dereferencing when you spell out all the code. (Remember, accessing members of this implies a pointer dereference.)

But even then, it depends on how smart your compiler is. In this case, let's say it looks at your structure and decides it can't stuff it in a register because it's two floats, so it will use pointers to access them. So the dereferenced pointer case (which is what references get implemented as) is the best you'll get. The assembly is going to look something like this (this is pseudo-assembly-code):

// Setup for the function. Usually already done by the inlining.
r1 <- this
r2 <- &result
r3 <- &b

// Actual function.
r4 <- r1[0]
r4 <- r4 * r3[0]
r2[0] <- r4
r4 <- r1[4]
r4 <- r4 * r3[4]
r2[4] <- r4

This is assuming a RISC-like architecture (say, ARM). x86 probably uses less steps but it gets expanded to about this level of detail by the instruction decoder anyway. The point being that it's all fixed-offset dereferences of pointers in registers, which is about as fast as it will get. The optimizer can try to be smarter and implement the objects across several registers, but that kind of optimizer is a lot harder to write. (Though I have a sneaking suspicion that an LLVM-type compiler/optimizer could do that optimization easily if result were merely a temporary object that is not preserved.)

So, since you're using this, you have an implicit pointer dereference. But what if the object were on the stack? Doesn't help; stack variables turn into fixed-offset dereferences of the stack pointer (or frame pointer, if used). So you're dereferencing a pointer somewhere in the end, unless your compiler is bright enough to take your object and spread it across multiple registers.

Feel free to pass the -S option to gcc to get a disassembly of the final code to see what's really happening in your case.

回答2:

You really should leave inline-ing to the compiler.

That said, functions defined within the class definition (as is the case with A) are inline by default. The inline specifier for A::operator * is useless.

The more interesting case is when you have the member function definition outside of the class definition. Here, inline is required if you would like to provide a hint to the compiler (which it may ignore at will) that this is oft-used and the instructions should be compiled in-line within the caller.

Read the C++ FAQ 9.

回答3:

Here is how I would write the struct:

struct A
{
    float x, y;
    A(float ax, float ay) : x(ax), y(ay) { }
    A operator*(const A& b) const { return b(x * b.x, y * b.y); } 
}

To answer the question, yes writing an operator as a member function can be ever so slightly faster in certain circumstances, but not enough to make a noticeable difference in your code.

Some notes:

Never worry about using the inline keyword. Optimizing compilers make their own decisions about what and what not to inline.
Use initializing constructors. Do it because they improve code readability. Sleep better knowing that they can bring small performance benefits.
Pass structs by const reference as often as possible.
Focus on writing code that has good style not fast. Most code is fast enough, and if it isn't it is probably because of something boneheaded in the algorithms or handling of IO.

来源：https://stackoverflow.com/questions/10670560/inline-member-operators-vs-inline-operators-c

标签

c++

performance

struct

inline

operator-keyword