Is it possible to share a HashMap between threads without locking the entire HashMap?

前端 未结 3 1379
时光说笑
时光说笑 2021-02-13 05:23

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don\'t want to lock the whole

3条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-02-13 05:57

    I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?

    In previous work, I've used a RwLock>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.

    Here's an example:

    use std::{
        collections::HashMap,
        sync::{Arc, Mutex, RwLock},
        thread,
        time::Duration,
    };
    
    fn main() {
        let data = Arc::new(RwLock::new(HashMap::new()));
    
        let threads: Vec<_> = (0..10)
            .map(|i| {
                let data = Arc::clone(&data);
                thread::spawn(move || worker_thread(i, data))
            })
            .collect();
    
        for t in threads {
            t.join().expect("Thread panicked");
        }
    
        println!("{:?}", data);
    }
    
    fn worker_thread(id: u8, data: Arc>>>) {
        loop {
            // Assume that the element already exists
            let map = data.read().expect("RwLock poisoned");
    
            if let Some(element) = map.get(&id) {
                let mut element = element.lock().expect("Mutex poisoned");
    
                // Perform our normal work updating a specific element.
                // The entire HashMap only has a read lock, which
                // means that other threads can access it.
                *element += 1;
                thread::sleep(Duration::from_secs(1));
    
                return;
            }
    
            // If we got this far, the element doesn't exist
    
            // Get rid of our read lock and switch to a write lock
            // You want to minimize the time we hold the writer lock
            drop(map);
            let mut map = data.write().expect("RwLock poisoned");
    
            // We use HashMap::entry to handle the case where another thread 
            // inserted the same key while where were unlocked.
            thread::sleep(Duration::from_millis(50));
            map.entry(id).or_insert_with(|| Mutex::new(0));
            // Let the loop start us over to try again
        }
    }
    

    This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.

    This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.

    In that case, you can switch to a RwLock>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.

    The next step up would be to use a crate like arc-swap, which says:

    Then one would lock, clone the [RwLock>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.

    The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.

    I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.

    As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.

提交回复
热议问题