What is the intention of ODR?

后端 未结 4 642
囚心锁ツ
囚心锁ツ 2021-01-06 16:11

I do understand what ODR says, but I don\'t understand what it tries to achieve.

I see two consequences of violating it - user will get syntax error, which is totall

相关标签:
4条回答
  • 2021-01-06 16:17

    The ODR dictates what C++ programs are well formed. A ODR violation means your program is ill-formed, and the standard does not dictate what the program will do, if it should compile, etc. Mostly ODR violations are marked "no diagnostic required" to make the job of the compiler writer easier.

    This permits the C++ compiler to make certain simplifying assumptions about the code you feed it, like that ::A is the same struct type everywhere, and not have to check at each point of use.

    The compiler is free to take your code and compile it to format c:. Or anything else. It is free to detect ODR violations, and use it to prove that branch of code cannot run, and eliminate paths that lead there.

    0 讨论(0)
  • 2021-01-06 16:29

    To put it simply, the One Definition Rules guarantees:

    1. That entities that should be defined only once in the program are defined exactly once.

    2. That entities that can be defined in multiple Translation Units (classes, inline functions, template functions) have equivalent definitions that result in equivalent compiled code. The equivalence has to be perfect to be able to use any one definition at run time: the many definitions are indistinguishable.

    0 讨论(0)
  • 2021-01-06 16:33

    When the function expects to get one of these structs, and you redeclare it as something different, which struct does that function receive, and how? Remember, C++ is static, so if you send a struct in by value, the function must know the structure of it. Because C++ is type-safe, allowing violation of the ODR would violate this type safety.

    Most importantly, what would be the gain in the lack of the ODR? I can think of hundreds of things that it would make harder to not have it, and nothing to gain. There is literally no flexibility to be acheived from being able to stomp previously declared types in the same namespace. At the very best, it just would make multiple inclusion not require header guards, which is a very minimal gain at best.

    0 讨论(0)
  • To my knowledge, the rule's purpose is to prevent an object from being defined differently in different translation units.

    // a.cpp
    #include <iostream>
    
    class SharedClass {
        int a, b, c;
        bool d;
        int e, f, g;
    
      public:
        // ...
    };
    
    void a(const SharedClass& sc) {
        std::cout << "sc.a: " << sc.getA() << '\n'
                  << "sc.e: " << sc.getE() << '\n'
                  << "sc.c: " << sc.getC() << std::endl;
    }
    
    // -----
    
    // b.cpp
    class SharedClass {
        int b, e, g, a;
        bool d;
        int c, f;
    
      public:
        // ...
    };
    
    void b(SharedClass& sc) {
        sc.setA(sc.getA() - 13);
        sc.setG(sc.getG() * 2);
        sc.setD(true);
    }
    
    // -----
    
    // main.cpp
    int main() {
        SharedClass sc;
        /* Assume that the compiler doesn't get confused & have a heart attack,
         *  and uses the definition in "a.cpp".
         * Assume that by the definition in "a.cpp", this instance has:
         *   a = 3
         *   b = 5
         *   c = 1
         *   d = false
         *   e = 42
         *   f = -129
         *   g = 8
         */
    
        // ...
    
        a(sc); // Outputs sc.a, sc.e, and sc.c.
        b(sc); // Supposedly modifies sc.a, sc.g, and sc.d.
        a(sc); // Does NOT do what you think it does.
    }
    

    Considering this program, you might think SharedClass would behave identically in both a.cpp and b.cpp, since it has the same fields with the same names. However, notice that the fields are in a different order. Because of this, each translation unit will see it like this (assuming 4-byte ints, and 4-byte alignment):

    If the compiler uses hidden alignment members:

    // a.cpp
    Class layout:
    0x00: int  {a}
    0x04: int  {b}
    0x08: int  {c}
    0x0C: bool {d}
    0x0D: [alignment member, 3 bytes]
    0x10: int  {e}
    0x14: int  {f}
    0x18: int  {g}
    Size: 28 bytes.
    
    // b.cpp
    Class layout:
    0x00: int  {b}
    0x04: int  {e}
    0x08: int  {g}
    0x0C: int  {a}
    0x10: bool {d}
    0x11: [alignment member, 3 bytes]
    0x14: int  {c}
    0x18: int  {f}
    Size: 28 bytes.
    
    // main.cpp
    One of the above, up to the compiler.
    Alternatively, may be seen as undefined.
    

    If the compiler puts same-sized fields together, ordered from largest to smallest:

    // a.cpp
    Class layout:
    0x00: int  {a}
    0x04: int  {b}
    0x08: int  {c}
    0x0C: int  {e}
    0x10: int  {f}
    0x14: int  {g}
    0x18: bool {d}
    Size: 25 bytes.
    
    // b.cpp
    Class layout:
    0x00: int  {b}
    0x04: int  {e}
    0x08: int  {g}
    0x0C: int  {a}
    0x10: int  {c}
    0x14: int  {f}
    0x18: bool {d}
    Size: 25 bytes.
    
    // main.cpp
    One of the above, up to the compiler.
    Alternatively, may be seen as undefined.
    

    Notice, if you will, that while the class has the same size in both definitions, its members are in a completely different order.

    Field comparison (with alignment member):
    a.cpp field     b.cpp field
    a               b
    b               e
    c               g
    d & {align}     a
    e               d & {align}
    f               c
    g               f
    
    Field comparison (with hidden reordering):
    a.cpp field     b.cpp field
    a               b
    b               e
    c               g
    e               a
    f               c
    g               f
    d               d
    

    So, from a()'s perspective, b() actually changes sc.e, sc.c, and either sc.a or sc.d (depending on how it's compiled), completely changing the second call's output. [Note that this can even come up in supposedly-innocuous situations where you'd never expect it, such as if both a.cpp and b.cpp had the same definition for SharedClass, but specified different alignments. This would change the size of the alignment member, again giving the class different memory layouts in different translation units.]

    Now, that's what can happen if the same fields are laid out differently in different translation units. Imagine what would happen if the class had entirely different fields in different units.

    // c.cpp
    #include <string>
    #include <utility>
    
    // Assume alignment of 4.
    // Assume std::string stores a pointer to string memory, size_t (as long long), and pointer
    //  to allocator in its body, and is thus 16 (on 32-bit) or 24 (on 64-bit) bytes.
    // (Note that this is likely not the ACTUAL size of std::string, but I'm just using it for an
    //  example.)
    class SharedClass {
        char c;
        std::string str;
        short s;
        unsigned long long ull;
        float f;
    
      public:
        // ...
    };
    
    void c(SharedClass& sc, std::string str) {
        sc.setStr(std::move(str));
    }
    

    In this file, our SharedClass would be something like this:

    Class layout (32-bit, alignment member):
    0x00: char                {c}
    0x01: [alignment member, 3 bytes]
    0x04: string              {str}
    0x14: short               {s}
    0x16: [alignment member, 2 bytes]
    0x18: unsigned long long  {ull}
    0x20: float               {f}
    Size: 36 bytes.
    
    Class layout (64-bit, alignment member):
    0x00: char                {c}
    0x01: [alignment member, 3 bytes]
    0x04: string              {str}
    0x1C: short               {s}
    0x1E: [alignment member, 2 bytes]
    0x20: unsigned long long  {ull}
    0x28: float               {f}
    Size: 44 bytes.
    
    Class layout (32-bit, reordered):
    0x00: string              {str}
    0x10: unsigned long long  {ull}
    0x18: float               {f}
    0x1C: short               {s}
    0x1E: char                {c}
    Size: 31 bytes.
    
    Class layout (64-bit, reordered):
    0x00: string              {str}
    0x18: unsigned long long  {ull}
    0x20: float               {f}
    0x24: short               {s}
    0x26: char                {c}
    Size: 39 bytes.
    

    Not only will this SharedClass have different fields, it's an entirely different size. Trying to treat each translation unit as if they have the same SharedClass can and will break something, and silently reconciling each definition with each other is impossible. Just imagine the chaos that would happen if we called a(), b(), and c() on the same instance of SharedClass, or even what would happen if we tried to make an instance of SharedClass. With three different definitions, and the compiler having no idea which one is the actual definition, things can and will go poorly.

    This completely breaks inter-unit operability, requiring that either all of the code that uses a class either be in the same translation unit, or share the exact same definition of the class in every unit. Due to this, the ODR requires that a class only be defined once per unit, and share the same definition across all units, to guarantee that it will always have the same definition, and prevent this entire issue.


    Similarly, consider this simple function, func().

    // z.cpp
    #include <cmath>
    
    int func(int x, int y) {
        return static_cast<int>(round(pow((2 * x) - (3 * y), x + y) - (x / y)));
    }
    
    // -----
    
    // y.cpp
    int func(int x, int y) { return x + y; }
    
    // -----
    
    // x.cpp
    int q = func(9, 11);
    // Compiler has a heart attack, call 911.
    

    The compiler won't be able to tell which version of func() you mean, and will in fact treat them as the same function. This, naturally, will break things. It gets even worse when one version has side effects (such as changing global state, or causing a memory leak), and the other doesn't.

    In this case, the ODR is intended to guarantee that any given function will share the same definition across all translation units, instead of having different definitions in different units. This one would be somewhat easy to change (by treating all functions as inline for the purpose of the ODR, but otherwise only treating them as inline if explicitly or implicitly declared as such), but this could cause trouble in unforseen ways.


    Now, consider a simpler case, global variables.

    // i.cpp
    int global_int;
    
    namespace Globals {
        int ns_int = -5;
    }
    
    // -----
    
    // j.cpp
    int global_int;
    
    namespace Globals {
        int ns_int = 5;
    }
    

    In this case, each translation unit defines the variables global_int and Globals::ns_int, meaning that the program will have two distinct variables with the exact same mangled name. This can only end well during the linking phase, where the linker sees every instance of a symbol as referring to the same entity. Globals::ns_int will have more issues than global_int, due to having two different initialisation values hardcoded into the file; assuming the linker doesn't just explode, the program is guaranteed to have undefined behaviour.


    The ODR varies in complexity, depending on the entity in question. Some things can only have one definition in the entire program, but some can have multiple definitions as long as they're all exactly the same and there's only one per translation unit. No matter the case, the intent is that every unit will see the entity in exactly the same way.

    The main reason for this, though, is convenience. Not only is it easier for the compiler to assume that the ODR has been followed to the letter across every translation unit, it's faster and less CPU-, memory-, and disk-intensive. If there was no ODR, the compiler would have to compare every single translation unit to makre sure that every shared type and inline function definition was the same, and that every global variable and non-inline function was only defined in a single translation unit. This, naturally, would require that it load every unit from disk whenever it compiled any unit, using a lot of system resources that it won't actually need if the programmer followed good programming practices. In light of this, forcing programmers to follow the ODR lets the compiler assume that everything is fine and dandy, making its job (and the programmer's working and/or goofing off while waiting on the compiler) much easier. [Compared to this, making sure that the ODR is followed within a single unit is child's play.]

    0 讨论(0)
提交回复
热议问题