Why doesn't GCC optimize out deletion of null pointers in C++?

后端 未结 6 1164
情歌与酒
情歌与酒 2021-02-02 05:54

Consider a simple program:

int main() {
  int* ptr = nullptr;
  delete ptr;
}

With GCC (7.2), there is a call instruction regardi

6条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-02 06:35

    It's always safe (for correctness) to let your program call operator delete with a nullptr.

    For performance, it's very rare that having the compiler-generated asm actually do an extra test and conditional branch to skip a call to operator delete will be a win. (You can help gcc optimize away compile-time nullptr deletion without adding a runtime check, though; see below).

    First of all, larger code-size outside of a real hot-spot increases pressure on the L1I cache, and the even smaller decoded-uop cache on x86 CPUs that have one (Intel SnB-family, AMD Ryzen).

    Second, extra conditional branches use up entries in the branch-prediction caches (BTB = Branch Target Buffer and so on). Depending on the CPU, even a branch that's never taken may worsen predictions for other branches if it aliases them in the BTB. (On others, such a branch never gets an entry in the BTB, to save entries for branches where the default static prediction of fall-through is accurate.) See https://xania.org/201602/bpu-part-one.

    If nullptr is rare in a given code path, then on average checking & branch to avoid the call ends up with your program spending more time on the check than the check saves.

    If profiling shows you have a hot-spot that includes a delete, and instrumentation / logging shows that it often actually calls delete with a nullptr, then it's worth trying
    if (ptr) delete ptr; instead of just delete ptr;

    Branch prediction might have better luck in that one call site than for the branch inside operator delete, especially if there's any correlation with other nearby branches. (Apparently modern BPUs don't just look at each branch in isolation.) This is on top of saving the unconditional call into the library function (plus another jmp from the PLT stub, from dynamic linking overhead on Unix/Linux).


    If you are checking for null for any other reason, then it could make sense to put the delete inside the non-null branch of your code.

    You can avoid delete calls in cases where gcc can prove (after inlining) that a pointer is null, but without doing a runtime check if not:

    static inline bool 
    is_compiletime_null(const void *ptr) {
    #ifdef   __GNUC__
        // __builtin_constant_p(ptr) is false even for nullptr,
        // but the checking the result of booleanizing works.
        return __builtin_constant_p(!ptr) && !ptr;
    #else
        return false;
    #endif
    }
    

    It will always return false with clang because it evaluates __builtin_constant_p before inlining. But since clang already skips delete calls when it can prove a pointer is null, you don't need it.

    This might actually help in std::move cases, and you can safely use it anywhere with (in theory) no performance downside. I always compiles to if(true) or if(false), so it's very different from if(ptr), which is likely to result in a runtime branch because the compiler probably can't prove the pointer is non-null in most cases either. (A dereference might, though, because a null deref would be UB, and modern compilers optimized based on the assumption that the code doesn't contain any UB).

    You could make this a macro to avoid bloating non-optimized builds (and so it would "work" without having to inline first). You can use a GNU C statement-expression to avoid double-evaluating the macro arg (see examples for GNU C min() and max()). For the fallback for compilers without GNU extensions, you could write ((ptr), false) or something to evaluate the arg once for side effects while producing a false result.

    Demonstration: asm from gcc6.3 -O3 on the Godbolt compiler explorer

    void foo(int *ptr) {
        if (!is_compiletime_null(ptr))
            delete ptr;
    }
    
        # compiles to a tailcall of operator delete
        jmp     operator delete(void*)
    
    
    void bar() {
        foo(nullptr);
    }
    
        # optimizes out the delete
        rep ret
    

    It compiles correctly with MSVC (also on the compiler explorer link), but with the test always returning false, bar() is:

        # MSVC doesn't support GNU C extensions, and doesn't skip nullptr deletes itself
        mov      edx, 4
        xor      ecx, ecx
        jmp      ??3@YAXPEAX_K@Z      ; operator delete
    

    Interesting to note that MSVC's operator delete takes the object size as a function arg (mov edx, 4), but gcc/Linux/libstdc++ code just passes the pointer.


    Related: I found this blog post, using C11 (not C++11) _Generic to try to portably do something like __builtin_constant_p null-pointer checks inside static initializers.

提交回复
热议问题