In many discussions about undefined behavior (UB), the point of view has been put forward that in the mere presence in a program of any construct that has UB in a p
There's a clear divide between inherent undefined behaviour, such as n=n++, and code that can have defined or undefined behaviour depending on the program state at runtime, such as x/y for ints. In the latter case the program is required to work unless y is 0, but in the first case the compiler's asked to generate code that's totally illegitimate - it's within its rights to refuse to compile, it may just not be "bullet proofed" against such code and consequently its optimiser state (register allocations, records of which values may have been modified since read etc) gets corrupted resulting in bogus machine code for that and surrounding source code. It may be that early analysis recognised an "a=b++" situation and generated code for the preceding if to jump over a two byte instruction, but when n=n++ is encountered no instruction was output, such that the if statement jumps somewhere into the following opcodes. Anyway, it's simply game over. Putting an "if" in front, or even wrapping it in a different function, isn't documented as "containing" the undefined behaviour... bits of code aren't tainted with undefined behaviour - the Standard consistently says "the program has undefined behaviour".
In the general case the best we can say here is that it depends.
One case where the answer is no, happens when dealing with indeterminate values. The latest draft clearly makes it undefined behavior to produce an indeterminate value during an evaluation with some exceptions but the code sample clearly shows how subtle it could be:
[ Example:
int f(bool b) { unsigned char c; unsigned char d = c; // OK, d has an indeterminate value int e = d; // undefined behavior return b ? d : 0; // undefined behavior if b is true }
— end example ]
so this line of code:
return b ? d : 0;
is only undefined if b
is true
. This seems to be the intuitive approach and seems to be how John Regehr sees it as well, if we read It’s Time to Get Serious About Exploiting Undefined Behavior.
In this case the answer is yes, the code is erroneous even though we are not calling the code invoking undefined behavior:
constexpr const char *str = "Hello World" ;
constexpr char access()
{
return str[100] ;
}
int main()
{
}
clang
chooses to make access
erroneous even though it is never invoked (see it live).
In the dialect processed by gcc with full optimizations enabled, if a program contains two constructs which would behave identically in cases where both are defined, reliable program operation requires that any code that would switch among them only be executed in cases where both are defined. For example, when optimizations are enabled, both ARM gcc 9.2.1 and x86-64 gcc 10.1 will process the following source:
#include <limits.h>
#if LONG_MAX == 0x7FFFFFFF
typedef int longish;
#else
typedef long long longish;
#endif
long test(long *x, long *y)
{
if (*x)
{
if (x==y)
*y = 1;
else
*(longish*)y = 1;
}
return *x;
}
into machine code that will test if x
and y
are equal, set *x
to 1 if they are and *y
to 1 if they aren't, but return the previous value of *x
in either case. For purpose of determining whether anything might affect *x
, gcc decides that both branches of the if
are equivalent, and thus only evaluates the "false" branch. Since that can't affect *x
, it concludes that the if
as a whole can't either. That determination is unswayed by its observation that on the true branch, the write to *y
can be replaced with a write to *x
.
It should be, if not "shall".
Behavior, by definition from ISO C (no corresponding definition found in ISO C++ but it should be still somehow applicable), is:
3.4
1 behavior
external appearance or action
And UB:
WG21/N4527
1.3.25 [defns.undefined]
undefined behavior
behavior for which this International Standard imposes no requirements [ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. —end note ]
Despite "to behaving during translation" above, the word "behavior" used by ISO C++ is mainly about the execution of programs.
WG21/N4527
1.9 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5
2 Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example,
sizeof(int)
). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.6 Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).3 Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, evaluation of expressions in a new-initializer if the allocation function fails to allocate memory (5.3.4)). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine can thus have more than one possible execution for a given program and a given input.
4 Certain other operations are described in this International Standard as undefined (for example, the effect of attempting to modify a
const
object). [ Note: This International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.
6) This documentation also includes conditionally-supported constructs and locale-specific behavior. See 1.4.
It is clear the undefined behavior would be caused by specific language construct used wrongly or in a non-portable way (which is not conforming to the standard). However, the standard mention nothing about which specific portion of code in a program would cause it. In other words, "having undefined behavior" is the property (about conforming) of the whole program being executed, not any smaller parts of it.
The standard could have given a stronger guarantee to make the behavior well-defined once some specific code is not being executed, only when there exists a way to map the C++ code to the corresponding behavior precisely. This is hard (if not impossible) without a detailed semantic model about execution. In short, the operational semantics given by the abstract machine model above is not enough to achieve the stronger guarantee. But anyway, ISO C++ would never be JVMS or ECMA-335. And I don't expect there would be a complete set of formal semantics describing the language.
A key problem here is the meaning of "execution". Some people think "executing a program" means making the program being run. This is not quite true. Note the representation of program executed in the abstract machine is not specified. (Also note "this International Standard places no requirement on the structure of conforming implementations".) The code being executed here can be literally C++ code (not necessarily machine code or some other forms of intermediate code which is not specified by the standard at all). This effectively allows the core language to be implemented as an interpreter, an online partial evaluator or some other monsters translating C++ code on-the-fly. As a result, actually there is no way to split the phases of translation (defined by ISO C++ [lex.phases]) completely ahead of the process of execution without knowledge about specific implementations. Thus, it is necessary to allow UB occurring during the translation when it is too difficult to specify portable well-defined behavior.
Besides the problems above, perhaps for most ordinary users, one (non-technical) reason is enough: it is simply unnecessary to provide the stronger guarantee, allow bad code and defeat one of the (probable most important) usefulness aspect of UB itself: to encourage quickly throwing away some (unnecessarily) nonportable smelly code without effort to "fix" them which would be eventually in vain.
Additional notes:
Some words are copied and reconstructed from one of my reply to this comment.
In the context of a safety-critical embedded system, the posted code would be considered defective:
If a side effect on a scalar object is unsequenced relative to etc
Side effects are changes in the state of the execution environment (1.9/12). A change is a change, not an expression that, if evaluated, would potentially produce a change. If there is no change, there is no side effect. If there is no side effect, then no side effect is unsequenced relative to anything else.
This does not mean that any code which is never executed is UB-free (though I'm pretty sure most of it is). Each occurrence of UB in the standard needs to be examined separately. (The stricken-out text is probably overly cautious; see below).
The standard also says that
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
(emphasis mine)
This, as far as I can tell, is the only normative reference that says what the phrase "undefined behavior" means: an undefined operation in a program execution. No execution, no UB.