Implement a high performance mutex similar to Qt's one

前端 未结 2 577
青春惊慌失措
青春惊慌失措 2021-02-04 10:59

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.

相关标签:
2条回答
  • 2021-02-04 11:21

    General Advice

    As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
    Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
    That being said, I could think of two implementation approaches that might potentially speed up your program:

    Spinlock
    If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):

    class SpinLock {
        std::atomic_flag locked ;
    public:
        void lock() {
            while (locked.test_and_set(std::memory_order_acquire)) { 
                 std::this_thread::yield(); //<- this is not in the source but might improve performance. 
            }
        }
        void unlock() {
            locked.clear(std::memory_order_release);
        }
    };
    

    The drawback of course is that waiting threads don't stay asleep and steal processing time.

    Checked Locking

    This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.

    struct FastMux {
        //Status of the fast mutex
        std::atomic<bool> locked;
        //helper mutex and vc on which threads can wait in case of collision
        std::mutex mux;
        std::condition_variable cv;
        //the maximum number of threads that might be waiting on the cv (conservative estimation)
        std::atomic<int> cntr; 
    
        FastMux():locked(false), cntr(0){}
    
        void lock() {
            if (locked.exchange(true)) {
                cntr++;
                {
                    std::unique_lock<std::mutex> ul(mux);
                    cv.wait(ul, [&]{return !locked.exchange(true); });
                }
                cntr--;
            }
        }
        void unlock() {
            locked = false;
            if (cntr > 0){
                std::lock_guard<std::mutex> ul(mux);
                cv.notify_one();
            }
        }
    };
    

    Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.

    The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.

    I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.

    0 讨论(0)
  • 2021-02-04 11:42

    Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:

    • Though not directly C++/STL, Boost.Lockfree provides such a queue.
    • Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
    • A Fast Lock-Free Queue for C++

    Update wrt to comments:

    Queue size / overflow:

    • Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
    • Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.

    Consumer thread:

    • The queue could use std::condition_variable and make the consumer thread wait until there is data.
    • Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.
    0 讨论(0)
提交回复
热议问题