When I compile and run this code with Clang (-O3
) or MSVC (/O2
)...
#include
#include
static int con
++a[j]; // This is undefined behavior too, but Clang doesn't see it
Are you saying this is undefined behavior because the array elements are uninitialized?
If so, although this is a common interpretation of clause 4.1/1 in the standard I believe it is incorrect. The elements are 'uninitialized' in the sense that programmers usually use this term, but I do not believe this corresponds exactly to the C++ specification's use of the term.
In particular C++11 8.5/11 states that these objects are in fact default initialized, and this seems to me to be mutually exclusive with being uninitialized. The standard also states that for some objects being default initialized means that 'no initialized is performed'. Some might assume this means that they are uninitialized but this is not specified and I simply take it to mean that no such performance is required.
The spec does make clear that the array elements will have indeterminant values. C++ specifies, by reference to the C standard, that indeterminant values can be either valid representations, legal to access normally, or trap representations. If the particular indeterminant values of the array elements happen to all be valid representations, (and none are INT_MAX, avoiding overflow) then the above line does not trigger any undefined behavior in C++11.
Since these array elements could be trap representations it would be perfectly conformant for clang to act as though they are guaranteed to be trap representations, effectively choosing to make the code UB in order to create an optimization opportunity.
Even if clang doesn't do that it could still choose to optimize based on the dataflow. Clang does know how to do that, as demonstrated by the fact that if the inner loop is changed slightly then the loops do get removed.
So then why does the (optional) presence of UB seem to stymie optimization, when UB is usually taken as an opportunity for more optimization?
What may be going on is that clang has decided that users want int
trapping based on the hardware's behavior. And so rather than taking traps as an optimization opportunity, clang has to generate code which faithfully reproduces the program behavior in hardware. This means that the loops cannot be eliminated based on dataflow, because doing so might eliminate hardware traps.
C++14 updates the behavior such that accessing indeterminant values itself produces undefined behavior, independent of whether one considers the variable uninitialized or not: https://stackoverflow.com/a/23415662/365496
That is indeed very interesting. I tried your example with MSVC 2013. My first idea was that the fact that the ++a[j] is somewhat undefined is the reason why the loop is not removed, because removing this would definetly change the meaning of the program from an undefined/incorrect semantic to something meaningful, so I tried to initialize the values before but the loops still did not dissappear.
Afterwards I replaced the ++a[j]; with an a[j] = 0; which then produced an output without any loop so everything between the two calls to clock() was removed. I can only guess about the reason. Perhaps the optimizer is not able to prove that the operator++ has no side effects for any reason.
The undefined behavior is irrelevant here. Replacing the inner loop with:
for (int j = 1; j < N; ++j)
{
a[j-1] = a[j];
a[j] = j;
}
... has the same effect, at least with Clang.
The issue is that the inner loop both loads from a[j]
(for some j
) and stores to a[j]
(for some j
). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.
Contrary to n.m.'s answer, replacing int
with unsigned
does not make the problem go away. The code generated by Clang 3.4.1 using int and using unsigned int is identical.
It's an interesting issue with regards to optimizing. I would
expect that in most cases, the compiler would treat each element
of the array as an individual variable when doing dead code
analysis. Ans 0x8000 make too many individual variables to
track, so the compiler doesn't try. The fact that a[j]
doesn't always access the the same object could cause problems
as well for the optimizer.
Obviously, different compilers use different heuristics; a compiler could treat the array as a single object, and detect that it never affected output (observable behavior). Some compilers may choose not to, however, on the grounds that typically, it's a lot of work for very little gain: how often would such optimizations be applicable in real code?