I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer.
What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock).
Here is the old get method with the synchronized keyword:
public synchronized Request get()
{
if (head == tail)
{
return null;
}
Request r = requests[head];
head = (head + 1) % requests.length;
return r;
}
Here is the new get method without the synchronized keyword:
public Request get()
{
while (true)
{
int current = size.get();
if (current <= 0)
{
return null;
}
if (size.compareAndSet(current, current - 1))
{
break;
}
}
while (true)
{
int current = head.get();
int nextHead = (current + 1) % requests.length;
if (head.compareAndSet(current, nextHead))
{
return requests[current];
}
}
}
My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter.
Thanks!
My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc)
Yes, in the common case you are right. Java Concurrency in Practice discusses this in section 15.3.2:
[...] at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks. This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms,
AtomicPseudoRandom
reacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.Before we condemn
AtomicPseudoRandom
as poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)
(On the figures referred to by the text, Figure 15.1 shows that the performance of AtomicInteger and ReentrantLock is more or less equal when contention is high, while Figure 15.2 shows that under moderate contention the former outperforms the latter by a factor of 2-3.)
Update: on nonblocking algorithms
As others have noted, nonblocking algorithms, although potentially faster, are more complex, thus more difficult to get right. A hint from section 15.4 of JCiA:
Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tables, though designing new ones is a task best left to experts.
Nonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an
AtomicReference
to represent each link that must be updated atomically.
I wonder if jvm already does a few spin before really suspending the thread. It anticipate that well written critical sections, like yours, are very short and complete almost immediately. Therefore it should optimistically busy-wait for, I don't know, dozens of loops, before giving up and suspending the thread. If that's the case, it should behave the same as your 2nd version.
what a profiler shows might be very different from what's realy happending in a jvm at full speed, with all kinds of crazy optimizations. it's better to measure and compare throughputs without profiler.
Before doing this kind of synchronization optimizations, you really need a profiler to tell you that it's absolutely necessary.
Yes, synchronized under some conditions may be slower than atomic operation, but compare your original and replacement methods. The former is really clear and easy to maintain, the latter, well it's definitely more complex. Because of this there may be very subtle concurrency bugs, that you will not find during initial testing. I already see one problem, size
and head
can really get out of sync, because, though each of these operations is atomic, the combination is not, and sometimes this may lead to an inconsistent state.
So, my advise:
- Start simple
- Profile
- If performance is good enough, leave simple implementation as is
- If you need performance improvement, then start to get clever (possibly using more specialized lock at first), and TEST, TEST, TEST
Here's code for a busy wait lock.
public class BusyWaitLock
{
private static final boolean LOCK_VALUE = true;
private static final boolean UNLOCK_VALUE = false;
private final static Logger log = LoggerFactory.getLogger(BusyWaitLock.class);
/**
* @author Rod Moten
*
*/
public class BusyWaitLockException extends RuntimeException
{
/**
*
*/
private static final long serialVersionUID = 1L;
/**
* @param message
*/
public BusyWaitLockException(String message)
{
super(message);
}
}
private AtomicBoolean lock = new AtomicBoolean(UNLOCK_VALUE);
private final long maximumWaitTime ;
/**
* Create a busy wait lock with that uses the default wait time of two minutes.
*/
public BusyWaitLock()
{
this(1000 * 60 * 2); // default is two minutes)
}
/**
* Create a busy wait lock with that uses the given value as the maximum wait time.
* @param maximumWaitTime - a positive value that represents the maximum number of milliseconds that a thread will busy wait.
*/
public BusyWaitLock(long maximumWaitTime)
{
if (maximumWaitTime < 1)
throw new IllegalArgumentException (" Max wait time of " + maximumWaitTime + " is too low. It must be at least 1 millisecond.");
this.maximumWaitTime = maximumWaitTime;
}
/**
*
*/
public void lock ()
{
long startTime = System.currentTimeMillis();
long lastLogTime = startTime;
int logMessageCount = 0;
while (lock.compareAndSet(UNLOCK_VALUE, LOCK_VALUE)) {
long waitTime = System.currentTimeMillis() - startTime;
if (waitTime - lastLogTime > 5000) {
log.debug("Waiting for lock. Log message # {}", logMessageCount++);
lastLogTime = waitTime;
}
if (waitTime > maximumWaitTime) {
log.warn("Wait time of {} exceed maximum wait time of {}", waitTime, maximumWaitTime);
throw new BusyWaitLockException ("Exceeded maximum wait time of " + maximumWaitTime + " ms.");
}
}
}
public void unlock ()
{
lock.set(UNLOCK_VALUE);
}
}
来源:https://stackoverflow.com/questions/3556283/in-java-what-is-the-performance-of-atomicinteger-compareandset-versus-synchron