I have 32 machine threads and one ConcurrentHashMap
, which contains a lot of keys. Key
has defined a public method visit
The solution I will eventually go for is an array of ConcurrentHashMaps
instead of one ConcurrentHashMap
. This is ad hoc, but seems to be relevant for my usecase. I don't care about the second step being slow as it doesn't affect my code's performance. The solution is:
Object Creation:
Array Population (single threaded, not an issue):
Array Iteration (nicely multithreaded, performance gain):
To see the proof of concept code (as it's got some dependencies from the project I can't post it here) head towards my project on github
EDIT
Actually, implementing the above proof of concept for my system has proven to be time-consuming, bug-prone and grossly disappointing. Additionally I've discovered I would have missed many features of the standard library ConcurrentHashMap. The solution I have been exploring recently, which looks much less ad-hoc and much more promising is to use Scala, which produces bytecode that is fully interoperable with Java. The proof of concept relies on stunning library described in this paper and AFAIK it is currently IMPOSSIBLE to achieve a corresponding solution in vanilla Java without writing thousands lines of code, given the current state of the standard library and corresponding third-party libraries.
import scala.collection.parallel.mutable.ParHashMap
class Node(value: Int, id: Int){
var v = value
var i = id
override def toString(): String = v toString
}
object testParHashMap{
def visit(entry: Tuple2[Int, Node]){
entry._2.v += 1
}
def main(args: Array[String]){
val hm = new ParHashMap[Int, Node]()
for (i <- 1 to 10){
var node = new Node(0, i)
hm.put(node.i, node)
}
println("========== BEFORE ==========")
hm.foreach{println}
hm.foreach{visit}
println("========== AFTER ==========")
hm.foreach{println}
}
}
If I were you I'd just try iterating the key set of ConcurrentHashMap
first. You could try passing the processing of keys off to a thread pool (in bundles, if the task is too light weight), or even to a ForkJoin task but you should do that only if it's really necessary.
Having said that you could use a ConcurrentSkipListMap
, in which you can get a NavigableSet
of keys. You can then take out partitions from this by using the subSet
method. However, ConcurrentHashMap
would have better performance for put
, get
operations (note also it would use compareTo
rather than hashCode
). Situations where this is better seems pretty unlikely.
I could try to inherit from ConcurrentHashMap, get my hands on the instances of its inner Segment, try to group them into 32 groups and work on each group separately. This sounds like a hardcore approach though.
Hardcore indeed but about the only thing I would see that would work. toArray()
builds the array by doing an enumeration so no win there. I can't believe that a synchronized HashSet
would be better unless the ratio of visit()
runs to other map operations is pretty high.
The problem with the working with the Segment
s is that you are going to have to be extremely careful that your code is resilient because I assume other threads may be altering the table at the same time you are visiting the nodes and you need to avoid the inevitable race conditions. Delicate for sure.
The big question in my mind is if this is necessary? Has a profiler or timing runs shown to you that this is taking too long to visit()
each of the keys in one thread? Have you tried to do a thread-pool for each visit()
call and have one thread doing the enumeration and the pool threads doing the visit()
?