Adding blocking functions to lock-free queue

问题

I have a lock-free multi producer, single consumer queue, based on a circular buffer. So far, it only has non-blocking push_back() and pop_front() calls. Now I want to add blocking versions of those calls, but I want to minimize the impact this has on the performance of code that uses the non-blocking versions - namely, it should not turn them into "lock-by-default" calls.

E.g. the simplest version of a blocking push_back() would look like this:

void push_back_Blocking(const T& pkg) {
    if (!push_back(pkg)) {
        unique_lock<mutex> ul(mux);
        while (!push_back(pkg)) {
            cv_notFull.wait(ul);
        }
    }
}

but unfortunately this would also require to put the following block at the end of the "non-blocking" pop_front():

{
    std::lock_guard<mutex> lg(mux);
    cv_notFull.notify_all();
}

While the notify alone has hardly any performance impact (if no thread is waiting), the lock has.

So my question is:
How can I (using standard c++14 if possible) add blocking push_back and pop_front member functions to my queue without severely impeding the performance of the non_blocking counterparts (read: minimize system calls)? At least as long as no thread is actually blocked - but ideally even then.

For reference, my current version looks similar to this (I left out debug checks, data alignment and explicit memory orderings):

template<class T, size_t N>
class MPSC_queue {
    using INDEX_TYPE = unsigned long;
    struct Idx {
        INDEX_TYPE idx;
        INDEX_TYPE version_cnt;
    };
    enum class SlotState {
        EMPTY,
        FILLED
    };
    struct Slot {
        Slot() = default;               
        std::atomic<SlotState> state= SlotState::EMPTY;
        T data{};
    };
    struct Buffer_t {
        std::array<Slot, N> data{}; 
        Buffer_t() {
            data.fill(Slot{ SlotState::EMPTY, T{} });
        }
        Slot& operator[](Idx idx) {
            return this->operator[](idx.idx);
        }
        Slot& operator[](INDEX_TYPE idx) {
            return data[idx];                   
        }
    };

    Buffer_t buffer;
    std::atomic<Idx> head{};
    std::atomic<INDEX_TYPE> tail=0;

    INDEX_TYPE next(INDEX_TYPE old) { return (old + 1) % N; }

    Idx next(Idx old) {
        old.idx = next(old.idx);
        old.version_cnt++;
        return old;
    }
public:     
    bool push_back(const T& val) {
        auto tHead = head.load();
        Idx wrtIdx;
        do {
            wrtIdx = next(tHead);
            if (wrtIdx.idx == tail) {
                return false;
            }
        } while (!head.compare_exchange_strong(tHead, wrtIdx));

        buffer[wrtIdx].data = val;
        buffer[wrtIdx].state = SlotState::FILLED;
        return true;
    }

    bool pop_front(T& val) {                
        auto rIdx = next(tail);
        if (buffer[rIdx].state != SlotState::FILLED) {
            return false;
        }
        val = buffer[rIdx].data;
        buffer[rIdx].state = SlotState::EMPTY;
        tail = rIdx;
        return true;
    }
};

Related questions:

I asked a similar question specificly about optimizing the usage of condition_variable::notify here, but the question got closed as a supposedly duplicate of this question.
I disagree, because that question was about why the mutex is needed for condition variables in general (or rather it's pthread equivalent) - focusing on condition_variable::wait - and not if/how it can be avoided for the notify part. But apparently I didn't make that sufficiently clear (or people just disagreed with my opinion).

In any case, the answers in the linked question did not help me and as this was somewhat of an XY-problem anyway, I decided to ask another question about the actual problem I have and thus allow a wider range of possible solutions (maybe there is a way to avoid condition variables altogether).

This question is also very similar, but

It is about C on linux and the answers use platform specific constructs (pthreads and futexes)
The author there asked for efficent blocking calls, but no non-blocking ones at all. I on the other hand don't care too much about the efficiency of the blocking ones but want to keep the non-blocking ones as fast as possible.

回答1:

If there is potential waiter on condition variable, you have to lock mutex for notify_all call.

The thing is that condition check (!push_back(pkg)) is performed before wait on condition variable (C++11 provides no other way). So mutex is the only mean which can garantee constistency between these actions.

But it is possible to omit locking (and notification) in case when no potential waiter is involved. Just use additinal flag:

class MPSC_queue {
    ... // Original definitions
    std::atomic<bool> has_waiters;

public:
    void push_back_Blocking(const T& pkg) {
        if (!push_back(pkg)) {
            unique_lock<mutex> ul(mux);
            has_waiters.store(true, std::memory_order_relaxed); // #1
            while (!push_back(pkg)) { // #2 inside push_back() method
                cv_notFull.wait(ul);
                // Other waiter may clean flag while we wait. Set it again. Same as #1.
                has_waiters.store(true, std::memory_order_relaxed);
            }
            has_waiters.store(false, std::memory_order_relaxed);
        }
    }

    // Method is same as original, exposed only for #2 mark.
    bool push_back(const T& val) {
        auto tHead = head.load();
        Idx wrtIdx;
        do {
            wrtIdx = next(tHead);
            if (wrtIdx.idx == tail) { // #2
                return false;
            }
        } while (!head.compare_exchange_strong(tHead, wrtIdx));

        buffer[wrtIdx].data = val;
        buffer[wrtIdx].state = SlotState::FILLED;
        return true;
    }

    bool pop_front(T& val) {
        // Main work, same as original pop_front, exposed only for #3 mark.
        auto rIdx = next(tail);
        if (buffer[rIdx].state != SlotState::FILLED) {
            return false;
        }
        val = buffer[rIdx].data;
        buffer[rIdx].state = SlotState::EMPTY;
        tail = rIdx; // #3

        // Notification part
        if(has_waiters.load(std::memory_order_relaxed)) // #4
        {
            // There are potential waiters. Need to lock.
            std::lock_guard<mutex> lg(mux);
            cv_notFull.notify_all();
        }

        return true;
    }
};

Key relations here are:

Setting flag at #1 and reading tail for check condition at #2.
Storing tail at #3 and checking flag at #4.

Both these relations should expose some sort of universal order. That is #1 should be observered before #2 even by other thread. Same for #3 and #4.

In that case one can garantee that, if checking flag #4 found it not set, then possible futher condition check #2 will found effect of condition change #3. So it is safe to not lock (and notify), because no waiter is possible.

In your current implementation universal order between #1 and #2 is provided by loading tail with implicit memory_order_seq_cst. Same order between #3 and #4 is provided by storing tail with implicit memory_order_seq_cst.

In that approach, "Do not lock if no waiters", universal order is the most tricky part. In both relations, it is Read After Write order, which cannot be achieved with any combination of memory_order_acquire and memory_order_release. So memory_order_seq_cst should be used.

来源：https://stackoverflow.com/questions/32692176/adding-blocking-functions-to-lock-free-queue

标签

c++

multithreading

blocking

lock-free

condition-variable