A huge number of operations in C++ result in undefined behavior, where the spec is completely mute about what the program\'s behavior ought to be and allows for anything to
Using g++
-Wall -Werror -pedantic-error
(preferably with an appropriate -std
argument as well) will pick up quite a few case of U.B.
Things that -Wall
gets you include:
-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used.-Winit-self (C, C++, Objective-C and Objective-C++ only)
Warn about uninitialized variables which are initialized with themselves. Note this option can only be used with the -Wuninitialized option, which in turn only works with -O1 and above.-Wuninitialized
Warn if an automatic variable is used without first being initialized or if a variable may be clobbered by a "setjmp" call.
and various disallowed things you can do with specifiers to printf
and scanf
family functions.
You might want to read about SAFECode.
This is a research project from the University of Illinois, the goal is stated on the front page (linked above):
The purpose of the SAFECode project is to enable program safety without garbage collection and with minimal run-time checks using static analysis when possible and run-time checks when necessary. SAFECode defines a code representation with minimal semantic restrictions designed to enable static enforcement of safety, using aggressive compiler techniques developed in this project.
What is really interesting to me is the elimination of the runtime checks whenever the program can be proved to be correct statically, for example:
int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }
Should not incur any more overhead than the regular version.
In a lighter fashion, Clang has some guarantees about undefined behavior too as far as I recall, but I cannot get my hands on it...
Unfortunately I'm not aware of any such tool. Typically UB is defined as such precisely because it would be hard or impossible for a compiler to diagnose it in all cases.
In fact your best tool is probably compiler warnings: They often warn about UB type items (for example, non-virtual destructor in base classes, abusing the strict-aliasing rules, etc).
Code review can also help catch cases where UB is relied upon.
Then you have to rely on valgrind to capture the remaining cases.
This is a great question, but let me give an idea for why I think it might be impossible (or at least very hard) in general.
Presumably, such an implementation would almost be a C++ interpreter, or at least a compiler for something more like Lisp or Java. It would need to keep extra data for each pointer to ensure you did not perform arithmetic outside of an array or dereference something that was already freed or whatever.
Now, consider the following code:
int *p = new int;
delete p;
int *q = new int;
if (p == q)
*p = 17;
Is the *p = 17
undefined behavior? On the one hand, it dereferences p
after it has been freed. On the other hand, dereferencing q
is fine and p == q
...
But that is not really the point. The point is that whether the if
evaluates to true at all depends on the details of the heap implementation, which can vary from implementation to implementation. So replace *p = 17
by some actual undefined behavior, and you have a program that might very well blow up on a normal compiler but run fine on your hypothetical "UB detector". (A typical C++ implementation will use a LIFO free list, so the pointers have a good chance of being equal. A hypothetical "UB detector" might work more like a garbage collected language in order to detect use-after-free problems.)
Put another way, the existence of merely implementation-defined behavior makes it impossible to write a "UB detector" that works for all programs, I suspect.
That said, a project to create an "uber-strict C++ compiler" would be very interesting. Let me know if you want to start one. :-)
The clang
compiler can detect some undefined behaviors and warn against them. Probably not as complete as you want, but it's definitely a good start.
John Regehr in Finding Undefined Behavior Bugs by Finding Dead Code points out a tool called STACK and I quote from the site (emphasis mine):
Optimization-unstable code (unstable code for short) is an emerging class of software bugs: code that is unexpectedly eliminated by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database server. The consequences of unstable code range from incorrect functionality to missing security checks.
STACK is a static checker that detects unstable code in C/C++ programs. Applying STACK to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.
Also in C++11 for the case of constexpr variables and functions undefined behavior should be caught at compile time.
We also have gcc ubsan:
GCC recently (version 4.9) gained Undefined Behavior Sanitizer (ubsan), a run-time checker for the C and C++ languages. In order to check your program with ubsan, compile and link the program with -fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program.
and Clang Static Analyzer which includes many checks for undefined behavior. For example clangs
-fsanitize checks which includes -fsanitize=undefined
:
-fsanitize=undefined: Fast and compatible undefined behavior checker. Enables the undefined behavior checks that have small runtime cost and no impact on address space layout or ABI. This includes all of the checks listed below other than unsigned-integer-overflow.
and for C we can look at his article It’s Time to Get Serious About Exploiting Undefined Behavior which says:
[..]I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.[...]
Here is a link to kcc and I quote:
[...]If you try to run a program that is undefined (or one for which we are missing semantics), the program will get stuck. The message should tell you where it got stuck and may give a hint as to why. If you want help deciphering the output, or help understanding why the program is undefined, please send your .kdump file to us.[...]
and here are a link to Frama-C, an article where the first use of Frama-C as a C interpreter is described and an addendum to the article.