Our class was asked this question by the C programming prof:
You are given the code:
int x=1;
printf(\"%d\",++x,x+1);
What output w
What output will it always produce ?
It will produce 2 in all environments I can think of. Strict interpretation of the C99 standard however renders the behaviour undefined because the accesses to x do not meet the requirements that exist between sequence points.
Most students said undefined behavior. Can anyone help me understand why it is so?
I will now address the second question which I understand as "Why do most of the students of my class say that the shown code constitutes undefined behaviour?" and I think no other poster has answered so far. One part of the students will have remembered examples of undefined value of expressions like
f(++i,i)
The code you give fits this pattern but the students erroneously think that the behaviour is defined anyway because printf ignores the last parameter. This nuance confuses many students. Another part of the student will be as well versed in standard as David Thornley and say "undefined behaviour" for the correct reasons explained above.
The points made about undefined behavior are correct, but there is one additional wrinkle: printf may fail. It's doing file IO; there are any number of reasons it could fail, and it's impossible to eliminate them without knowing the complete program and the context in which it will be executed.
The output is likely to be 2 in every reasonable case. In reality, what you have is undefined behavior though.
Specifically, the standard says:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
There is a sequence point before evaluating the arguments to a function, and a sequence point after all the arguments have been evaluated (but the function not yet called). Between those two (i.e., while the arguments are being evaluated) there is not a sequence point (unless an argument is an expression includes one internally, such as using the &&
||
or ,
operator).
That means the call to printf
is reading the prior value both to determine the value being stored (i.e., the ++x
) and to determine the value of the second argument (i.e., the x+1
). This clearly violates the requirement quoted above, resulting in undefined behavior.
The fact that you've provided an extra argument for which no conversion specifier is given does not result in undefined behavior. If you supply fewer arguments that conversion specifiers, or if the (promoted) type of the argument disagrees with that of the conversion specifier you get undefined behavior -- but passing an extra parameter does not.
The correct answer is: the code produces undefined behavior.
The reason the behavior is undefined is that the two expressions ++x
and x + 1
are modifying x
and reading x
for an unrelated (to modification) reason and these two actions are not separated by a sequence point. This results in undefined behavior in C (and C++). The requirement is given in 6.5/2 of C language standard.
Note, that the undefined behavior in this case has absolutely nothing to do with the fact that printf
function is given only one format specifier and two actual arguments. To give more arguments to printf
than there are format specifiers in the format string is perfectly legal in C. Again, the problem is rooted in the violation of expression evaluation requirements of C language.
Also note, that some participants of this discussion fail to grasp the concept of undefined behavior, and insist on mixing it with the concept of unspecified behavior. To better illustrate the difference let's consider the following simple example
int inc_x(int *x) { return ++*x; }
int x_plus_1(int x) { return x + 1; }
int x = 1;
printf("%d", inc_x(&x), x_plus_1(x));
The above code is "equivalent" to the original one, except that the operations that involve our x
are wrapped into functions. What is going to happen in this latest example?
There's no undefined behavior in this code. But since the order of evaluation of printf
arguments is unspecified, this code produces unspecified behavior, i.e. it is possible that printf
will be called as printf("%d", 2, 2)
or as printf("%d", 2, 3)
. In both cases the output will indeed be 2
. However, the important difference of this variant is that all accesses to x
are wrapped into sequence points present at the beginning and at the end of each function, so this variant does not produce undefined behavior.
This is exactly the reasoning some other posters are trying to force onto the original example. But it cannot be done. The original example produces undefined behavior, which is a completely different beast. They are apparently trying to insist that in practice undefined behavior is always equivalent to unspecified behavior. This is a totally bogus claim that only indicate the lack of expertise in those who make it. The original code produces undefined behavior, period.
To continue with the example, let's modify the previous code sample to
printf("%d %d", inc_x(&x), x_plus_1(x));
the output of the code will become generally unpredictable. It can print 2 2
or it can print 2 3
. However note that even though the behavior is unpredictable, it still does not produce the undefined behavior. The behavior is unspecified, bit not undefined. Unspecified behavior is restricted to two possibilities: either 2 2
or 2 3
. Undefined behavior is not restricted to anything. It can format you hard drive instead of printing something. Feel the difference.
Any time the behavior of a program is undefined, anything can happen — the classical phrase is that "demons may fly out of your nose" — although most implementations don't go that far.
The arguments of a function are conceptually evaluated in parallel (the technical term is that there is no sequence point between their evaluation). That means the expressions ++x
and x+1
may be evaluated in this order, in the opposite order, or in some interleaved way. When you modify a variable and try to access its value in parallel, the behavior is undefined.
With many implementations, the arguments are evaluated in sequence (though not always from left to right). So you're unlikely to see anything but 2 in the real world.
However, a compiler could generate code like this:
r1
. x+1
by adding 1 to r1
.++x
by adding 1 to r1
. That's ok because x
has been loaded into r1
. Given how the compiler was designed, step 2 cannot have modified r1
, because that could only happen if x
was read as well as written between two sequence points. Which is forbidden by the C standard.r1
into x
.And on this (hypothetical, but correct) compiler, the program would print 3.
(EDIT: passing an extra argument to printf
is correct (§7.19.6.1-2 in N1256; thanks to Prasoon Saurav) for pointing this out. Also: added an example.)
Most students said undefined behavior. Can anyone help me understand why it is so?
Because order in which function parameters are calculated is not specified.