Is it possible to share a HashMap between threads without locking the entire HashMap?

前端 未结 3 1388
时光说笑
时光说笑 2021-02-13 05:23

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don\'t want to lock the whole

相关标签:
3条回答
  • 2021-02-13 05:57

    I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?

    In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.

    Here's an example:

    use std::{
        collections::HashMap,
        sync::{Arc, Mutex, RwLock},
        thread,
        time::Duration,
    };
    
    fn main() {
        let data = Arc::new(RwLock::new(HashMap::new()));
    
        let threads: Vec<_> = (0..10)
            .map(|i| {
                let data = Arc::clone(&data);
                thread::spawn(move || worker_thread(i, data))
            })
            .collect();
    
        for t in threads {
            t.join().expect("Thread panicked");
        }
    
        println!("{:?}", data);
    }
    
    fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
        loop {
            // Assume that the element already exists
            let map = data.read().expect("RwLock poisoned");
    
            if let Some(element) = map.get(&id) {
                let mut element = element.lock().expect("Mutex poisoned");
    
                // Perform our normal work updating a specific element.
                // The entire HashMap only has a read lock, which
                // means that other threads can access it.
                *element += 1;
                thread::sleep(Duration::from_secs(1));
    
                return;
            }
    
            // If we got this far, the element doesn't exist
    
            // Get rid of our read lock and switch to a write lock
            // You want to minimize the time we hold the writer lock
            drop(map);
            let mut map = data.write().expect("RwLock poisoned");
    
            // We use HashMap::entry to handle the case where another thread 
            // inserted the same key while where were unlocked.
            thread::sleep(Duration::from_millis(50));
            map.entry(id).or_insert_with(|| Mutex::new(0));
            // Let the loop start us over to try again
        }
    }
    

    This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.

    This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.

    In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.

    The next step up would be to use a crate like arc-swap, which says:

    Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.

    The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.

    I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.

    As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.

    0 讨论(0)
  • 2021-02-13 05:59

    Maybe you want to consider rust-evmap:

    A lock-free, eventually consistent, concurrent multi-value map.

    The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

    0 讨论(0)
  • 2021-02-13 06:11

    Suppose the key of the data is map-able to a u8

    You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>

    When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)

    When you want a value from the map you will need to do a double get, something like:

    data.get(&map_to_u8(&key)).unwrap().lock().expect("poison").get(&key)

    where the unwrap is safe because we initialized the first map with all the value.

    to write in the map something like:

    data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);

    It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.

    @Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.

    use std::{
        collections::HashMap,
        sync::{Arc, Mutex, RwLock},
        thread,
        time::Duration,
    };
    
    fn main() {
        let mut inner = HashMap::new( );
        for i in 0..=u8::max_value() {
            inner.insert(i, Mutex::new(HashMap::new()));
        }
        let data = Arc::new(inner);
    
        let threads: Vec<_> = (0..100)
            .map(|i| {
                let data = Arc::clone(&data);
                thread::spawn(move || worker_thread(i, data))
            })
            .collect();
    
        for t in threads {
            t.join().expect("Thread panicked");
        }
    
        println!("{:?}", data);
    }
    
    fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) {
        loop {
    
            // first unwrap is safe to unwrap because we populated for every `u8`
            if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) {
                let mut element = element.lock().expect("Mutex poisoned");
    
                // Perform our normal work updating a specific element.
                // The entire HashMap only has a read lock, which
                // means that other threads can access it.
                *element += 1;
                thread::sleep(Duration::from_secs(1));
    
                return;
            }
    
            // If we got this far, the element doesn't exist
    
            // Get rid of our read lock and switch to a write lock
            // You want to minimize the time we hold the writer lock
    
            // We use HashMap::entry to handle the case where another thread
            // inserted the same key while where were unlocked.
            thread::sleep(Duration::from_millis(50));
            data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0));
            // Let the loop start us over to try again
        }
    }
    
    
    0 讨论(0)
提交回复
热议问题