C++ and PHP vs C# and Java - unequal results

问题

I found something a little strange in C# and Java. Let's look at this C++ code:

#include <iostream>
using namespace std;

class Simple
{
public:
    static int f()
    {
        X = X + 10;
        return 1;
    }

    static int X;
};
int Simple::X = 0;

int main() {
    Simple::X += Simple::f();
    printf("X = %d", Simple::X);
    return 0;
}

In a console you will see X = 11 (Look at the result here - IdeOne C++).

Now let's look at the same code on C#:

class Program
{
    static int x = 0;

    static int f()
    {
        x = x + 10;
        return 1;
    }

    public static void Main()
    {
        x += f();
        System.Console.WriteLine(x);
    }
}

In a console you will see 1 (not 11!) (look at the result here - IdeOne C# I know what you thinking now - "How that is possible?", but let's go to the following code.

Java code:

import java.util.*;
import java.lang.*;
import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
    static int X = 0;
    static int f()
    {
        X = X + 10;
        return 1;
    }
    public static void main (String[] args) throws java.lang.Exception
    {
        Formatter f = new Formatter();
        f.format("X = %d", X += f());
        System.out.println(f.toString());
    }
}

Result the same as in C# (X = 1, look at the result here).

And for the last time let's look at the PHP code:

<?php
class Simple
{
    public static $X = 0;

    public static function f()
    {
        self::$X = self::$X + 10;
        return 1;
    }
}

$simple = new Simple();
echo "X = " . $simple::$X += $simple::f();
?>

Result is 11 (look at the result here).

I have a little theory - these languages (C# and Java) are making a local copy of static variable X on the stack (are they ignoring the static keyword?). And that is reason why result in those languages is 1.

Is somebody here, who have other versions?

回答1:

The C++ standard states:

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. —end note ]

§5.17 [expr.ass]

Hence, as in the same evaluation you use X and a function with a side effect on X, the result is undefined, because:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

§1.9 [intro.execution]

It happens to be 11 on many compilers, but there is no guarantee that a C++ compiler won't give you 1 as for the other languages.

If you're still skeptical, another analysis of the standard leads to the same conclusion: THe standard also says in the same section as above:

The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

In you case X = X + f() except that X is evaluated only once.
As there is no guarantee on the order of evaluation, in X + f(), you cannot take for granted that first f is evaluated and then X.

Addendum

I'm not a Java expert, but the Java rules clearly specify the order of evaluation in an expression, which is guaranteed to be from left to right in section 15.7 of Java Language Specifications. In section 15.26.2. Compound Assignment Operators the Java specs also say that E1 op= E2 is equivalent to E1 = (T) ((E1) op (E2)).

In your Java program this means again that your expression is equivalent to X = X + f() and first X is evaluated, then f(). So the side effect of f() is not taken into account in the result.

So your Java compiler doesn't have a bug. It just complies with the specifications.

回答2:

Thanks to comments by Deduplicator and user694733, here is a modified version of my original answer.

The C++ version has ~~undefined~~unspecified behaviour.

There is a subtle difference between "undefined" and "unspecified", in that the former allows a program to do anything (including crashing) whereas the latter allows it to choose from a set of particular allowed behaviours without dictating which choice is correct.

Except of very rare cases, you will always want to avoid both.

A good starting point to understand whole issue are the C++ FAQs Why do some people think x = ++y + y++ is bad? , What’s the value of i++ + i++? and What’s the deal with “sequence points”?:

Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.

(...)

Basically, in C and C++, if you read a variable twice in an expression where you also write it, the result is undefined.

(...)

At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (...) The “certain specified points” that are called sequence points are (...) after evaluation of all a function’s parameters but before the first expression within the function is executed.

In short, modifying a variable twice between two consecutive sequence points yields undefined behaviour, but a function call introduces an intermediate sequence point (actually, two intermediate sequence points, because the return statement creates another one).

This means the fact that you have a function call in your expression "saves" your Simple::X += Simple::f(); line from being undefined and turns it into "only" unspecified.

Both 1 and 11 are possible and correct outcomes, whereas printing 123, crashing or sending an insulting e-mail to your boss are not allowed behaviours; you'll just never get a guarantee whether 1 or 11 will be printed.

The following example is slightly different. It's seemingly a simplification of the original code but really serves to highlight the difference between undefined and unspecified behaviour:

#include <iostream>

int main() {
    int x = 0;
    x += (x += 10, 1);
    std::cout << x << "\n";
}

Here the behaviour is indeed undefined, because the function call has gone away, so both modifications of x occur between two consecutive sequence points. The compiler is allowed by the C++ language specification to create a program which prints 123, crashes or sends an insulting e-mail to your boss.

(The e-mail thing of course is just a very common humorous attempt at explaining how undefined really means anything goes. Crashes are often a more realistic result of undefined behaviour.)

In fact, the , 1 (just like the return statement in your original code) is a red herring. The following yields undefined behaviour, too:

#include <iostream>

int main() {
    int x = 0;
    x += (x += 10);
    std::cout << x << "\n";
}

This may print 20 (it does so on my machine with VC++ 2013) but the behaviour is still undefined.

(Note: this applies to built-in operators. Operator overloading changes the behaviour back to specified, because overloaded operators copy the syntax from the built-in ones but have the semantics of functions, which means that an overloaded += operator of a custom type that appears in an expression is actually a function call. Therefore, not only are sequence points introduced but the entire ambiguity goes away, the expression becoming equivalent to x.operator+=(x.operator+=(10));, which has guaranteed order of argument evaluation. This is probably irrelevant to your question but should be mentioned anyway.)

In contrast, the Java version

import java.io.*;

class Ideone
{
    public static void main(String[] args)
    {
        int x = 0;
        x += (x += 10);
        System.out.println(x);
    }
}

must print 10. This is because Java has neither undefined nor unspecified behaviour with regards to evaluation order. There are no sequence points to be concerned about. See Java Language Specification 15.7. Evaluation Order:

The Java programming language guarantees that the operands of operators appear to be evaluated in a specific evaluation order, namely, from left to right.

So in the Java case, x += (x += 10), interpreted from left to right, means that first something is added to 0, and that something is 0 + 10. Hence 0 + (0 + 10) = 10.

See also example 15.7.1-2 in the Java specification.

Going back to your original example, this also means that the more complex example with the static variable has defined and specified behaviour in Java.

Honestly, I don't know about C# and PHP but I would guess that both of them have some guaranteed evaluation order as well. C++, unlike most other programming languages (but like C) tends to allow much more undefined and unspecified behaviour than other languages. That's not good or bad. It's a tradeoff between robustness and efficiency. Choosing the right programming language for a particular task or project is always a matter of analysing tradeoffs.

In any case, expressions with such side effects are bad programming style in all four languages.

One final word:

I found a little bug in C# and Java.

You should not assume to find bugs in language specifications or compilers if you don't have many years of professional experience as a software engineer.

回答3:

As Christophe has already written, this is basically an undefined operation.

So why does C++ and PHP does it one way, and C# and Java the other way?

In this case (which may be different for different compilers and platforms), the order of evaluation of arguments in C++ is inverted compared to C# - C# evaluates arguments in order of writing, while the C++ sample does it the other way around. This boils down to the default calling conventions both use, but again - for C++, this is an undefined operation, so it may differ based on other conditions.

To illustrate, this C# code:

class Program
{
    static int x = 0;

    static int f()
    {
        x = x + 10;
        return 1;
    }

    public static void Main()
    {
        x = f() + x;
        System.Console.WriteLine(x);
    }
}

Will produce 11 on output, rather than 1.

That's simply because C# evaluates "in order", so in your example, it first reads x and then calls f(), while in mine, it first calls f() and then reads x.

Now, this still might be unrealiable. IL (.NET's bytecode) has + as pretty much any other method, but optimizations by the JIT compiler might result in a different order of evaluation. On the other hand, since C# (and .NET) does define the order of evaluation / execution, so I guess a compliant compiler should always produce this result.

In any case, that's a lovely unexpected outcome you've found, and a cautionary tale - side-effects in methods can be a problem even in imperative languages :)

Oh, and of course - static means something different in C# vs. C++. I've seen that mistake made by C++ers coming to C# before.

EDIT:

Let me just expand a bit on the "different languages" issue. You've automatically assumed, that C++'s result is the correct one, because when you're doing the calculation manually, you're doing the evaluation in a certain order - and you've determined this order to comply with the results from C++. However, neither C++ nor C# do analysis on the expression - it's simply a bunch of operations over some values.

C++ does store x in a register, just like C#. It's just that C# stores it before evaluating the method call, while C++ does it after. If you change the C++ code to do x = f() + x instead, just like I've done in C#, I expect you'll get the 1 on output.

The most important part is that C++ (and C) simply didn't specify an explicit order of operations, probably because it wanted to exploit architectures and platforms that do either one of those orders. Since C# and Java were developed in a time when this doesn't really matter anymore, and since they could learn from all those failures of C/C++, they specified an explicit order of evaluation.

回答4:

According to the Java language specification:

JLS 15.26.2, Compound Assignment Operators

A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T) ((E1) op (E2)) , where T is the type of E1 , except that E1 is evaluated only once.

This small program demonstrates the difference, and exhibits expected behavior based on this standard.

public class Start
{
    int X = 0;
    int f()
    {
        X = X + 10;
        return 1;
    }
    public static void main (String[] args) throws java.lang.Exception
    {
        Start actualStart = new Start();
        Start expectedStart = new Start();
        int actual = actualStart.X += actualStart.f();
        int expected = (int)(expectedStart.X + expectedStart.f());
        int diff = (int)(expectedStart.f() + expectedStart.X);
        System.out.println(actual == expected);
        System.out.println(actual == diff);
    }
}

In order,

actual is assigned to value of actualStart.X += actualStart.f().
expected is assigned to the value of the
result of retrieving actualStart.X, which is 0, and
applying the addition operator to actualStart.X with
the return value of invoking actualStart.f(), which is 1
and assigning the result of 0 + 1 to expected.

I also declared diff to show how changing the order of invocation changes the result.

diff is assigned to value of the
the return value of invoking diffStart.f(), with is 1, and
applying the addition operator to that value with
the value of diffStart.X (which is 10, a side effect of diffStart.f()
and assigning the result of 1 + 10 to diff.

In Java, this is not undefined behavior.

Edit:

To address your point regarding local copies of variables. That is correct, but it has nothing to do with static. Java saves the result of evaluating each side (left side first), then evaluates result of performing the operator on the saved values.

来源：https://stackoverflow.com/questions/25323202/c-and-php-vs-c-sharp-and-java-unequal-results

标签

java

php

c++

expression-evaluation