Can compiler optimization introduce bugs?

后端 未结 22 892
名媛妹妹
名媛妹妹 2020-11-27 13:23

Today I had a discussion with a friend of mine and we debated for a couple of hours about \"compiler optimization\".

I defended the point that sometimes

相关标签:
22条回答
  • 2020-11-27 13:46

    Just one example: a few days ago, someone discovered that gcc 4.5 with the option -foptimize-sibling-calls (which is implied by -O2) produces an Emacs executable that segfaults on startup.

    This has apparently been fixed since.

    0 讨论(0)
  • 2020-11-27 13:46

    More, and more aggressive optimizations could be enabled if the program you compile has a good testing suite. Then it is possible to run that suite and be somewhat more sure the program operates correctly. Also, you can prepare your own tests that match closely that do you plan to do in production.

    It is also true that any large program may have (and probably indeed has) some bugs independently on which switches do you use to compile it.

    0 讨论(0)
  • 2020-11-27 13:46

    I work on a large engineering application, and every now and then we see release only crashes and other problems reported by clients. Our code has 37 files (out of around 6000) where we have this at the top of the file, to turn off optimization to fix such crashes:

    #pragma optimize( "", off)
    

    (We use Microsoft Visual C++ native, 2015, but it is true for just about any compiler, except maybe Intel Fortran 2016 update 2 where we have not yet turned of any optimizations.)

    If you search through the Microsoft Visual Studio feedback site you can find some optimization bugs there as well. We occasionally log some of ours (if you can reproduce it easily enough with a small section of code and you are willing to take the time) and they do get fixed, but sadly others get introduced again. smiles

    Compilers are programs written by people, and any big program has bugs, trust me on that. The compiler optimization options most certainly has bugs and turning on optimization can certainly introduce bugs in your program.

    0 讨论(0)
  • 2020-11-27 13:47

    Yes. A good example is the double-checked locking pattern. In C++ there is no way to safely implement double-checked locking because the compiler can re-order instructions in ways that make sense in a single-threaded system but not in a multi-threaded one. A full discussion can be found at http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf

    0 讨论(0)
  • 2020-11-27 13:49

    To combine the other posts:

    1. Compilers do occasionally have bugs in their code, like most software. The "smart people" argument is completely irrelevant to this, as NASA satellites and other apps built by smart people also have bugs. The coding that does optimization is different coding from that which doesn't, so if the bug happens to be in the optimizer then indeed your optimized code may contain errors while your non-optimized code will not.

    2. As Mr. Shiny and New pointed out, it's possible for code that is naive with regard to concurrency and/or timing issues to run satisfactorily without optimization yet fail with optimization as this may change the timing of execution. You could blame such a problem on the source code, but if it will only manifest when optimized, some people might blame optimization.

    0 讨论(0)
  • 2020-11-27 13:50

    Compiler optimizations can introduce bugs or undesirable behaviour. That's why you can turn them off.

    One example: a compiler can optimize the read/write access to a memory location, doing things like eliminating duplicate reads or duplicate writes, or re-ordering certain operations. If the memory location in question is only used by a single thread and is actually memory, that may be ok. But if the memory location is a hardware device IO register, then re-ordering or eliminating writes may be completely wrong. In this situation you normally have to write code knowing that the compiler might "optimize" it, and thus knowing that the naive approach doesn't work.

    Update: As Adam Robinson pointed out in a comment, the scenario I describe above is more of a programming error than an optimizer error. But the point I was trying to illustrate is that some programs, which are otherwise correct, combined with some optimizations, which otherwise work properly, can introduce bugs in the program when they are combined together. In some cases the language specification says "You must do things this way because these kinds of optimizations may occur and your program will fail", in which case it's a bug in the code. But sometimes a compiler has a (usually optional) optimization feature that can generate incorrect code because the compiler is trying too hard to optimize the code or can't detect that the optimization is inappropriate. In this case the programmer must know when it is safe to turn on the optimization in question.

    Another example: The linux kernel had a bug where a potentially NULL pointer was being dereferenced before a test for that pointer being null. However, in some cases it was possible to map memory to address zero, thus allowing the dereferencing to succeed. The compiler, upon noticing that the pointer was dereferenced, assumed that it couldn't be NULL, then removed the NULL test later and all the code in that branch. This introduced a security vulnerability into the code, as the function would proceed to use an invalid pointer containing attacker-supplied data. For cases where the pointer was legitimately null and the memory wasn't mapped to address zero, the kernel would still OOPS as before. So prior to optimization the code contained one bug; after it contained two, and one of them allowed a local root exploit.

    CERT has a presentation called "Dangerous Optimizations and the Loss of Causality" by Robert C. Seacord which lists a lot of optimizations that introduce (or expose) bugs in programs. It discusses the various kinds of optimizations that are possible, from "doing what the hardware does" to "trap all possible undefined behaviour" to "do anything that's not disallowed".

    Some examples of code that's perfectly fine until an aggressively-optimizing compiler gets its hands on it:

    • Checking for overflow

      // fails because the overflow test gets removed
      if (ptr + len < ptr || ptr + len > max) return EINVAL;
      
    • Using overflow artithmetic at all:

      // The compiler optimizes this to an infinite loop
      for (i = 1; i > 0; i += i) ++j;
      
    • Clearing memory of sensitive information:

      // the compiler can remove these "useless writes"
      memset(password_buffer, 0, sizeof(password_buffer));
      

    The problem here is that compilers have, for decades, been less aggressive in optimization, and so generations of C programmers learn and understand things like fixed-size twos complement addition and how it overflows. Then the C language standard is amended by compiler developers, and the subtle rules change, despite the hardware not changing. The C language spec is a contract between the developers and compilers, but the terms of the agreement are subject to change over time and not everyone understands every detail, or agrees that the details are even sensible.

    This is why most compilers offer flags to turn off (or turn on) optimizations. Is your program written with the understanding that integers might overflow? Then you should turn off overflow optimizations, because they can introduce bugs. Does your program strictly avoid aliasing pointers? Then you can turn on the optimizations that assume pointers are never aliased. Does your program try to clear memory to avoid leaking information? Oh, in that case you're out of luck: you either need to turn off dead-code-removal or you need to know, ahead of time, that your compiler is going to eliminate your "dead" code, and use some work-around for it.

    0 讨论(0)
提交回复
热议问题