Circular lock-free buffer

后端未结

关注

 18  674

I\'m in the process of designing a system which connects to one or more stream of data feeds and do some analysis on the data than trigger events based on the result. In a t

相关标签:

18条回答

走了就别回头了

2020-11-29 15:50

Sutter's queue is sub-optimal and he knows it. The Art of Multicore programming is a great reference but don't trust the Java guys on memory models, period. Ross's links will get you no definite answer because they had their libraries in such problems and so on.

Doing lock-free programming is asking for trouble, unless you want to spend a lot of time on something that you are clearly over-engineering before solving the problem (judging by the description of it, it is a common madness of 'looking for perfection' in cache coherency). It takes years and leads to not solving the problems first and optimising later, a common disease.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-11-29 15:52

The term of art for what you want is a lock-free queue. There's an excellent set of notes with links to code and papers by Ross Bencina. The guy whose work I trust the most is Maurice Herlihy (for Americans, he pronounces his first name like "Morris").

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-29 15:53
Here is how I would do it:
- map the queue into an array
- keep state with a next read and next next write indexes
- keep an empty full bit vector around
Insertion consists of using a CAS with an increment and roll over on the next write. Once you have a a slot, add your value and then set the empty/full bit that matches it.

Removals require a check of the bit before to test on underflows but other than that, are the same as for the write but using read index and clearing the empty/full bit.

Be warned,
1. I'm no expert in these things
2. atomic ASM ops seem to be very slow when I've used them so if you end up with more than a few of them, you might be faster to use locks embedded inside the insert/remove functions. The theory is that a single atomic op to grab the lock followed by (very) few non atomic ASM ops might be faster than the same thing done by several atomic ops. But to make this work would require manual or automatic inlineing so it's all one short block of ASM.
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-11-29 15:55

This is an old thread, but since it hasn't been mentioned, yet - there is a lock-free, circular, 1 producer -> 1 consumer, FIFO available in the JUCE C++ framework.

https://www.juce.com/doc/classAbstractFifo#details

0 讨论(0)
发布评论:

提交评论
- 加载中...

傲寒

2020-11-29 15:58

I am not expert of hardware memory models and lock free data structures and I tend to avoid using those in my projects and I go with traditional locked data structures.

However I`ve recently noticed that video : Lockless SPSC queue based on ring buffer

This is based on an open source high performance Java library called LMAX distruptor used by a trading system : LMAX Distruptor

Based on the presentation above, you make head and tail pointers atomic and atomically check for the condition where head catches tail from behind or vice versa.

Below you can see a very basic C++11 implementation for it :

// USING SEQUENTIAL MEMORY
#include<thread>
#include<atomic>
#include <cinttypes>
using namespace std;

#define RING_BUFFER_SIZE 1024  // power of 2 for efficient %
class lockless_ring_buffer_spsc
{
    public :

        lockless_ring_buffer_spsc()
        {
            write.store(0);
            read.store(0);
        }

        bool try_push(int64_t val)
        {
            const auto current_tail = write.load();
            const auto next_tail = increment(current_tail);
            if (next_tail != read.load())
            {
                buffer[current_tail] = val;
                write.store(next_tail);
                return true;
            }

            return false;  
        }

        void push(int64_t val)
        {
            while( ! try_push(val) );
            // TODO: exponential backoff / sleep
        }

        bool try_pop(int64_t* pval)
        {
            auto currentHead = read.load();

            if (currentHead == write.load())
            {
                return false;
            }

            *pval = buffer[currentHead];
            read.store(increment(currentHead));

            return true;
        }

        int64_t pop()
        {
            int64_t ret;
            while( ! try_pop(&ret) );
            // TODO: exponential backoff / sleep
            return ret;
        }

    private :
        std::atomic<int64_t> write;
        std::atomic<int64_t> read;
        static const int64_t size = RING_BUFFER_SIZE;
        int64_t buffer[RING_BUFFER_SIZE];

        int64_t increment(int n)
        {
            return (n + 1) % size;
        }
};

int main (int argc, char** argv)
{
    lockless_ring_buffer_spsc queue;

    std::thread write_thread( [&] () {
             for(int i = 0; i<1000000; i++)
             {
                    queue.push(i);
             }
         }  // End of lambda expression
                                                );
    std::thread read_thread( [&] () {
             for(int i = 0; i<1000000; i++)
             {
                    queue.pop();
             }
         }  // End of lambda expression
                                                );
    write_thread.join();
    read_thread.join();

     return 0;
}

0 讨论(0)

醉酒成梦

2020-11-29 15:58

Sometime ago, I've found a nice solution to this problem. I believe that it the smallest found so far.

The repository has a example of how use it to create N threads (readers and writers) and make then share a single seat.

I made some benchmarks, on the test example and got the following results (in million ops/sec) :

By buffer size

By number of threads

Notice how the number of threads do not change the throughput.

I think this is the ultimate solution to this problem. It works and is incredible fast and simple. Even with hundreds of threads and a queue of a single position. It can be used as a pipeline beween threads, allocating space inside the queue.

The repository has some early versions written in C# and pascal. Im working to make something more complete polished to show its real powers.

I hope some of you can validate the work or help with some ideas. Or at least, can you break it?

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3