What is the name of this locking technique?

问题

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):

synchronized boolean containsSpecial() {
   return troveMap.contains(key);
}

Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).

I noticed that by changing the above to:

boolean containsSpecial() {
    if ( troveMap.contains(key) ) {
        // most of the time (>90%) we shall pass here, dodging lock-acquisition
        return true;
    }
    synchronized (this) {
        return troveMap.contains(key);
    }
}

I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).

Does this optimization look correct (knowing that once a key is there it shall stay there forever)?

What is the name for this technique?

EDIT

The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):

synchronized void addSpecialKeyValue( key, value ) {
    ....
}

回答1:

This code is not correct.

Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.

If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.

In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.

Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.

回答2:

It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.

Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.

So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.

Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?

Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.

回答3:

This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.

Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.

My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:

synchronized (this) {
    troveMap.put(key, value);
    troveMap = troveMap;
}

That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.

Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?

回答4:

Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.

You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:

Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.

If your map implementation uses object references as keys, then I don't see how you can get in trouble.

EDIT

The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:

Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.

So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.

回答5:

I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads

boolean containsSpecial() {
    return troveMap.contains(key);
}

void addSpecialKeyValue( key, value ) {
    troveMap.putIfAbsent(key,value);
}

another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes

ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
    rwlock.readLock().lock();
    try{
        return troveMap.contains(key);
    }finally{
        rwlock.readLock().release();
    }
}

void addSpecialKeyValue( key, value ) {
    rwlock.writeLock().lock();
    try{
        //...
        troveMap.put(key,value);
    }finally{
        rwlock.writeLock().release();
    }
}

回答6:

Why you reinvent the wheel? Simply use ConcurrentHashMap.putIfAbsent

来源：https://stackoverflow.com/questions/8420102/what-is-the-name-of-this-locking-technique

标签

java

optimization

synchronization

locking

trove4j