ConcurrentHashMap: avoid extra object creation with “putIfAbsent”?

后端 未结 7 2023
北海茫月
北海茫月 2020-12-12 19:10

I am aggregating multiple values for keys in a multi-threaded environment. The keys are not known in advance. I thought I would do something like this:

class         


        
相关标签:
7条回答
  • 2020-12-12 19:49

    This is a problem I also looked for an answer. The method putIfAbsent does not actually solve the extra object creation problem, it just makes sure that one of those objects doesn't replace another. But the race conditions among threads can cause multiple object instantiation. I could find 3 solutions for this problem (And I would follow this order of preference):

    1- If you are on Java 8, the best way to achieve this is probably the new computeIfAbsent method of ConcurrentMap. You just need to give it a computation function which will be executed synchronously (at least for the ConcurrentHashMap implementation). Example:

    private final ConcurrentMap<String, List<String>> entries =
            new ConcurrentHashMap<String, List<String>>();
    
    public void method1(String key, String value) {
        entries.computeIfAbsent(key, s -> new ArrayList<String>())
                .add(value);
    }
    

    This is from the javadoc of ConcurrentHashMap.computeIfAbsent:

    If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

    2- If you cannot use Java 8, you can use Guava's LoadingCache, which is thread-safe. You define a load function to it (just like the compute function above), and you can be sure that it'll be called synchronously. Example:

    private final LoadingCache<String, List<String>> entries = CacheBuilder.newBuilder()
            .build(new CacheLoader<String, List<String>>() {
                @Override
                public List<String> load(String s) throws Exception {
                    return new ArrayList<String>();
                }
            });
    
    public void method2(String key, String value) {
        entries.getUnchecked(key).add(value);
    }
    

    3- If you cannot use Guava either, you can always synchronise manually and do a double-checked locking. Example:

    private final ConcurrentMap<String, List<String>> entries =
            new ConcurrentHashMap<String, List<String>>();
    
    public void method3(String key, String value) {
        List<String> existing = entries.get(key);
        if (existing != null) {
            existing.add(value);
        } else {
            synchronized (entries) {
                List<String> existingSynchronized = entries.get(key);
                if (existingSynchronized != null) {
                    existingSynchronized.add(value);
                } else {
                    List<String> newList = new ArrayList<>();
                    newList.add(value);
                    entries.put(key, newList);
                }
            }
        }
    }
    

    I made an example implementation of all those 3 methods and additionally, the non-synchronized method, which causes extra object creation: http://pastebin.com/qZ4DUjTr

    0 讨论(0)
  • 2020-12-12 19:54

    Java 8 introduced an API to cater for this exact problem, making a 1-line solution:

    public void record(String key, String value) {
        entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value);
    }
    

    For Java 7:

    public void record(String key, String value) {
        List<String> values = entries.get(key);
        if (values == null) {
            entries.putIfAbsent(key, Collections.synchronizedList(new ArrayList<String>()));
            // At this point, there will definitely be a list for the key.
            // We don't know or care which thread's new object is in there, so:
            values = entries.get(key);
        }
        values.add(value);
    }
    

    This is the standard code pattern when populating a ConcurrentHashMap.

    The special method putIfAbsent(K, V)) will either put your value object in, or if another thread got before you, then it will ignore your value object. Either way, after the call to putIfAbsent(K, V)), get(key) is guaranteed to be consistent between threads and therefore the above code is threadsafe.

    The only wasted overhead is if some other thread adds a new entry at the same time for the same key: You may end up throwing away the newly created value, but that only happens if there is not already an entry and there's a race that your thread loses, which would typically be rare.

    0 讨论(0)
  • 2020-12-12 20:02

    In the end, I implemented a slight modification of @Bohemian's answer. His proposed solution overwrites the values variable with the putIfAbsent call, which creates the same problem I had before. The code that seems to work looks like this:

        public void record(String key, String value) {
            List<String> values = entries.get(key);
            if (values == null) {
                values = Collections.synchronizedList(new ArrayList<String>());
                List<String> values2 = entries.putIfAbsent(key, values);
                if (values2 != null)
                    values = values2;
            }
            values.add(value);
        }
    

    It's not as elegant as I'd like, but it's better than the original that creates a new ArrayList instance at every call.

    0 讨论(0)
  • 2020-12-12 20:04

    Waste of memory (also GC etc.) that Empty Array list creation problem is handled with Java 1.7.40. Don't worry about creating empty arraylist. Reference : http://javarevisited.blogspot.com.tr/2014/07/java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.html

    0 讨论(0)
  • 2020-12-12 20:09

    As of Java-8 you can create Multi Maps using the following pattern:

    public void record(String key, String value) { entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())) .add(value); }

    The ConcurrentHashMap documentation (not the general contract) specifies that the ArrayList will only be created once for each key, at the slight initial cost of delaying updates while the ArrayList is being created for a new key:

    http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-

    0 讨论(0)
  • 2020-12-12 20:11

    The approach with putIfAbsent has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironments with high contention. The Lambda isn't the reason behind this "powerloss", the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations.

    the benchmark:

    import java.util.Random;
    import java.util.concurrent.ConcurrentHashMap;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    import java.util.concurrent.atomic.AtomicInteger;
    import java.util.concurrent.atomic.AtomicLong;
    
    public class ConcurrentHashMapTest {
        private final static int numberOfRuns = 1000000;
        private final static int numberOfThreads = Runtime.getRuntime().availableProcessors();
        private final static int keysSize = 10;
        private final static String[] strings = new String[keysSize];
        static {
            for (int n = 0; n < keysSize; n++) {
                strings[n] = "" + (char) ('A' + n);
            }
        }
    
        public static void main(String[] args) throws InterruptedException {
            for (int n = 0; n < 20; n++) {
                testPutIfAbsent();
                testComputeIfAbsentLamda();
            }
        }
    
        private static void testPutIfAbsent() throws InterruptedException {
            final AtomicLong totalTime = new AtomicLong();
            final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
            final Random random = new Random();
            ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
    
            for (int i = 0; i < numberOfThreads; i++) {
                executorService.execute(new Runnable() {
                    @Override
                    public void run() {
                        long start, end;
                        for (int n = 0; n < numberOfRuns; n++) {
                            String s = strings[random.nextInt(strings.length)];
                            start = System.nanoTime();
    
                            AtomicInteger count = map.get(s);
                            if (count == null) {
                                count = new AtomicInteger(0);
                                AtomicInteger prevCount = map.putIfAbsent(s, count);
                                if (prevCount != null) {
                                    count = prevCount;
                                }
                            }
                            count.incrementAndGet();
                            end = System.nanoTime();
                            totalTime.addAndGet(end - start);
                        }
                    }
                });
            }
            executorService.shutdown();
            executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
            System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
                    + " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
        }
    
        private static void testComputeIfAbsentLamda() throws InterruptedException {
            final AtomicLong totalTime = new AtomicLong();
            final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
            final Random random = new Random();
            ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
            for (int i = 0; i < numberOfThreads; i++) {
                executorService.execute(new Runnable() {
                    @Override
                    public void run() {
                        long start, end;
                        for (int n = 0; n < numberOfRuns; n++) {
                            String s = strings[random.nextInt(strings.length)];
                            start = System.nanoTime();
    
                            AtomicInteger count = map.computeIfAbsent(s, (k) -> new AtomicInteger(0));
                            count.incrementAndGet();
    
                            end = System.nanoTime();
                            totalTime.addAndGet(end - start);
                        }
                    }
                });
            }
            executorService.shutdown();
            executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
            System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
                    + " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
        }
    
    }
    

    The results:

    Test testPutIfAbsent average time per run: 115.756501 ns
    Test testComputeIfAbsentLamda average time per run: 276.9667055 ns
    Test testPutIfAbsent average time per run: 134.2332435 ns
    Test testComputeIfAbsentLamda average time per run: 223.222063625 ns
    Test testPutIfAbsent average time per run: 119.968893625 ns
    Test testComputeIfAbsentLamda average time per run: 216.707419875 ns
    Test testPutIfAbsent average time per run: 116.173902375 ns
    Test testComputeIfAbsentLamda average time per run: 215.632467375 ns
    Test testPutIfAbsent average time per run: 112.21422775 ns
    Test testComputeIfAbsentLamda average time per run: 210.29563725 ns
    Test testPutIfAbsent average time per run: 120.50643475 ns
    Test testComputeIfAbsentLamda average time per run: 200.79536475 ns
    
    0 讨论(0)
提交回复
热议问题