I\'m in the process of designing a system which connects to one or more stream of data feeds and do some analysis on the data than trigger events based on the result. In a t
I would agree with this article and recommend against using lock-free data structures. A relatively recent paper on lock-free fifo queues is this, search for further papers by the same author(s); there's also a PhD thesis on Chalmers regarding lock-free data structures (I lost the link). However, you did not say how large your elements are -- lock-free data structures work efficiently only with word-sized items, so you'll have to dynamically allocate your elements if they're larger than a machine word (32 or 64 bits). If you dynamically allocate elements, you shift the (supposed, since you haven't profiled your program and you're basically doing premature optimization) bottleneck to memory allocator, so you need a lock-free memory allocator, e.g., Streamflow, and integrate it with your application.
The requirement that producers or consumers block if the buffer is empty or full suggests that you should use a normal locking data structure, with semaphores or condition variables to make the producers and consumers block until data is available. Lock-free code generally doesn't block on such conditions - it spins or abandons operations that can't be done instead of blocking using the OS. (If you can afford to wait until another thread produces or consumes data, then why is waiting on a lock for another thread to finish updating the data structure any worse?)
On (x86/x64) Linux, intra-thread synchronization using mutexes is reasonably cheap if there is no contention. Concentrate on minimizing the time that the producers and consumers need to hold onto their locks. Given that you've said that you only care about the last N recorded data points, I think a circular buffer would be do this reasonably well. However, I don't really understand how this fits in with the blocking requirement and the idea of consumers actually consuming (removing) the data they read. (Do you want consumers to only look at the last N data points, and not remove them? Do you want producers to not care if consumers can't keep up, and just overwrite old data?)
Also, as Zan Lynx commented, you can aggregate/buffer up your data into bigger chunks when you've got lots of it coming in. You could buffer up a fixed number of points, or all the data received within a certain amount of time. This means that there will be fewer synchronization operations. It does introduce latency, though, but if you're not using real-time Linux, then you'll have to deal with that to an extent anyway.
You may try lfqueue
It is simple to use, it is circular design lock free
int *ret;
lfqueue_t results;
lfqueue_init(&results);
/** Wrap This scope in multithread testing **/
int_data = (int*) malloc(sizeof(int));
assert(int_data != NULL);
*int_data = i++;
/*Enqueue*/
while (lfqueue_enq(&results, int_data) != 1) ;
/*Dequeue*/
while ( (ret = lfqueue_deq(&results)) == NULL);
// printf("%d\n", *(int*) ret );
free(ret);
/** End **/
lfqueue_clear(&results);
Just for completeness: there's well tested lock-free circular buffer in OtlContainers, but it is written in Delphi (TOmniBaseBoundedQueue is circular buffer and TOmniBaseBoundedStack is bounded stack). There's also an unbounded queue in the same unit (TOmniBaseQueue). The unbounded queue is described in Dynamic lock-free queue – doing it right. The initial implementation of the bounded queue (circular buffer) was described in A lock-free queue, finally! but the code was updated since then.
Although this is an old question, no one mentioned DPDK's lockless ring buffer. It's a high throughput ring buffer that supports multiple producers and multiple consumers. It also provides single consumer and single producer modes, and the ring buffer is wait-free in SPSC mode. It's written in C and supports multiple architectures.
In addition, it supports Bulk and Burst modes where items can be enqueued/dequeued in bulk. The design let's multiple consumers or multiple producers write to the queue at the same time by simple reserving the space through moving an atomic pointer.
Check out Disruptor (How to use it) which is a ring-buffer that multiple threads can subscribe to: