Are “data races” and “race condition” actually the same thing in context of concurrent programming

I often find these terms being used in context of concurrent programming . Are they the same thing or different ?

Baris Kasikci

No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.

The definition of a data race is pretty clear, and therefore, its discovery can be automated. A data race occurs when 2 instructions from different threads access the same memory location, at least one of these accesses is a write and there is no synchronization that is mandating any particular order among these accesses.

A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior. Many race conditions can be caused by data races, but this is not necessary.

Consider the following simple example where x is a shared variable:

Thread 1    Thread 2

 lock(l)     lock(l)
 x=1         x=2
 unlock(l)   unlock(l)

In this example, the writes to x from thread 1 and 2 are protected by locks, therefore they are always happening in some order enforced by the order with which the locks are acquired at runtime. That is, the writes' atomicity cannot be broken; there is always a happens before relationship between the two writes in any execution. We just cannot know which write happens before the other a priori.

There is no fixed ordering between the writes, because locks cannot provide this. If the programs' correctness is compromised, say when the write to x by thread 2 is followed by the write to x in thread 1, we say there is a race condition, although technically there is no data race.

It is far more useful to detect race conditions than data races; however this is also very difficult to achieve.

Constructing the reverse example is also trivial. This blog post also explains the difference very well, with a simple bank transaction example.

According to Wikipedia, the term "race condition" has been in use since the days of the first electronic logic gates. In the context of Java, a race condition can pertain to any resource, such as a file, network connection, a thread from a thread pool, etc.

The term "data race" is best reserved for its specific meaning defined by the JLS.

The most interesting case is a race condition that is very similar to a data race, but still isn't one, like in this simple example:

class Race {
  static volatile int i;
  static int uniqueInt() { return i++; }
}

Since i is volatile, there is no data race; however, from the program correctness standpoint there is a race condition due to the non-atomicity of the two operations: read i, write i+1. Multiple threads may receive the same value from uniqueInt.

No, they are different & neither of them is a subset of one or vice-versa.

The term race condition is often confused with the related term data race, which arises when synchronization is not used to coordinate all access to a shared nonfinal field. You risk a data race whenever a thread writes a variable that might next be read by another thread or reads a variable that might have last been written by another thread if both threads do not use synchronization; code with data races has no useful defined semantics under the Java Memory Model. Not all race conditions are data races, and not all data races are race conditions, but they both can cause concurrent programs to fail in unpredictable ways.

Taken from the excellent book - Java Concurrency in Practice by Joshua Bloch & Co.

来源：https://stackoverflow.com/questions/11276259/are-data-races-and-race-condition-actually-the-same-thing-in-context-of-conc

标签

multithreading

concurrency

language-agnostic

data-race