Suppose we have a volatile int a
. One thread does
while (true) {
a = 1;
a = 0;
}
and another thread does
w
Short answer:
Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.
In the example given by the original question, the sequence of operations in question is the following:
read(a)
read(a)
Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in a
at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.
Original longer answer:
The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read r1
and second read r2
of a
cannot be collapsed, you need to show that some operation is semantically required to appear between them.
The operations on the thread with r1
and r2
is the following:
--> r(a) --> r(a) --> add -->
To express the requirement that something (say y
) lie between r1
and r2
, you need to require that r1
happens-before y
and y
happens-before r2
. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is saying y
happens-before r2
, but the partial order would allow y
to also occur before r1
, thus collapsing the read operations.
If no scenario exists which requires an operation to fall between r1
and r2
, then you can declare that no operation ever appears between r1
and r2
and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.
Edit My answer is getting voted down, so I'm going to go into additional details.
Here are some related questions:
Is the Java compiler or JVM required to collapse these read operations?
No. The expressions a
and a
used in the add expression are not constant expressions, so there is no requirement that they be collapsed.
Does the JVM collapse these read operations?
To this, I'm not sure of the answer. By compiling a program and using javap -c
, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).
Should the JVM collapse these read operations?
Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.
Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.
Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of a
in the following sequence could produce an incorrect result:
a1 = read(a)
b1 = read(b)
a2 = read(a)
result = op(a1, b1, a2)
Suppose initially a
and b
have their default value 0. Then you execute just the first read(a)
.
Now suppose another thread executes the following sequence:
a = 1
b = 1
Finally, suppose the first thread executes the line read(b)
. If you were to cache the originally read value of a
, you would end up with the following call:
op(0, 1, 0)
This is not correct. Since the updated value of a
was stored before writing to b
, there is no way to read the value b1 = 1
and then read the value a2 = 0
. Without caching, the correct sequence of events leads to the following call.
op(0, 1, 1)
However, if you were to ask the question "Is there any way to allow the read of a
to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, suppose a
and b
are each 4 bytes, and they appear sequentially in memory with a
aligned on an 8-byte boundary. A 64-bit process could implement the sequence read(a) read(b)
as an atomic 64-bit load operation, which would allow the value of a
to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).