Usage of HashMap in a multi-threaded environment for regular update

问题

I have a Java application in which I maintain set of IPs of my other servers in a hash map in memory. The hash map contains mapping between servers instance ids to servers ip address. I also maintain these servers information in a database for persistence.

I am trying to solve a simple problem where I just need to cache the servers information in memory for faster access. So I have used hashmap for that. And I need to make sure that the server information in memory are NOT stale and all the servers in the cache are responsive.

So I have created two separate background daemon threads where

one thread which gets each entry from the hashmap and pings all of them. If any of the server is not responsive, then it removes that entry from the hashmap.
Another thread basically synchronizes the database entries with this hashmap cache. Hence it queries all the entries database, and removes the entries in hashmap which are not there in DB and for new of the entries in DB, it pings each of them and adds to the hashmap.

Here first thread runs frequently lets say for evey 15 seconds and second DB thread runs for every 5 minutes.

Since both the threads are updating the cache here, I have used ConcurrentHashMap since it will be synchronized. Even then When I read multiple articles, documentations and some of the stackoverflow posts, I see multiple threads updating the hashmap is going to be risky, like when one thread is iterating over the hashmap, other thread may get triggered and start updating the hashmap.

So How Can I solve this using different approach here so that I don't disturb the JVM in terms of application performance, time and space complexities and make sure that I have only responsive server entries in my hashmap all most all the time.

回答1:

ConcurrentHashMap guarantees this:

The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.

What this means is that in the worst case scenario, an update made by one thread won't be seen by the second one until its next iteration. Let's look at what that means for your application:

If a new server is added by the synchronization thread while the pinging thread is running, it might not be pinged at this iteration. It will be pinged only in the next iteration, after 15 seconds. That doesn't seem to be a problem as long as you take this behavior into account (i.e. if you don't run a third thread that removes anything that hasn't been pinged in the last 15 seconds or something similar)

If a server is deleted by the synchronization thread while pinging is in progress, the server might still be pinged, but the server's record will still be deleted from the cache. Again, not a problem.

If the pinging thread removes a server while synchronization is in progress, the synchronization thread might still see that server in the cache. Again, I don't think that's a problem.

回答2:

Use Collections.synchronizedMap(map) if you need to ensure data consistency, and each thread needs to have an up-to-date view of data. Use the ConcurrentHashMap if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.

There is a very good article explaining the internal deep concepts/working of maps in java with real use cases:

http://java.dzone.com/articles/java-7-hashmap-vs

Hope this helps.

来源：https://stackoverflow.com/questions/25250791/usage-of-hashmap-in-a-multi-threaded-environment-for-regular-update

标签

java

hashmap

synchronization

concurrenthashmap