Dynamic Lock-free memory allocators

问题

One of the difficulties in writing algorithms or data structures that satisfy lock-free progress guarantees is dynamic memory allocation: calling something like malloc or new isn't guaranteed to be lock-free in a portable manner. However, many lock-free implementations of malloc or new exist, and there are also a variety of lock-free memory allocators that can be used to implement lock-free algorithms/data-structures.

However, I still don't understand how this can actually completely satisfy lock-free progress guarantees, unless you specifically limit your data-structure or algorithm to some pre-allocated static memory pool. But if you need dynamic memory allocation, I don't understand how any alleged lock-free memory allocator can ever be truly lock-free in the long-run. The problem is that no matter how amazing your lock-free malloc or new might be, ultimately you may run out of memory, at which point you have to fall back on asking the operating system for more memory. That means that ultimately you have to call brk() or mmap() or some such low-level equivalent to actually get access to more memory. And there is simply no guarantee that any of these low-level calls are implemented in a lock-free manner.

There's simply no way around this, (unless you're using an ancient OS like MS-DOS that doesn't provide memory protection, or you write your own completely lock-free operating system - two scenarios that are not practical or likely.) So, how can any dynamic memory allocator truly be lock-free?

回答1:

As you have found yourself, the fundamental OS allocator is most likely not lock-free, because it has to deal with multiple processes and all manner of interesting stuff that makes it really hard to not introduce some sort of lock.

For some cases, however, the "lock free memory allocation" doesn't mean "never locks", but "statistically locks so rarely that it doesn't really matter". Which is fine for anything but the most strict real-time systems. If you don't have high contention on your lock, then lock or no lock doesn't really matter - the purpose of lock-free is not really the overhead of the lock itself, but the ease with which it becomes a bottle-neck where every thread or process in the system has to pass through this one place to do anything useful, and as it does so, it has to wait in queue [it may not be a true queue either, it may be "whoever wakes first" or some other mechanism that decides who comes out next, after the current caller].

There are a few different options to solve this:

If you have a memory pool with a finite size, you can ask the OS for all that memory at once, when the software is being started. And after the memory has been chunked out from the OS, it can be used as a lock-free pool. The obvious drawback is that it has a limit to how much memory can be allocated. You then either have to stop allocating (fail the application alltogether, or fail that particular operation).

Of course, in a system like Linux or Windows, there is still no guarantee that memory allocation, in a lock-free scenario, means "instant access to the allocated memory", since the system can and will allocate memory without actual physical memory backing to it, and only once the memory is actually being used, the physical memory page is assigned to it. This may both involve locks and for example disk-I/O to page out some other page to the swap.
For such strict real-time systems that the time of a single system call that may contend for a lock is "too much", the solution is of course to use a dedicated OS, one that has a lock-free allocator inside the OS (or at least one that has a known real-time behaviour that is acceptable - it locks for at most X microsecons [X can be less than 1.0]). Real-time systems often have a pool of memory and fixed size buckets for recycling old allocations, which can be done in a lock-free manner - the buckets are a linked list, so you can insert/remove from that list with atomic compare&exchange operations [possibly with retry, so although it's technically lock-free, it's not zero wait time in contended situations].
Another solution that can work is to have "per thread pools". This can get a bit complicated if you pass data between threads, but if you either accept that "memory freed for reuse may end up in a different thread" (which of course leads to problems along the lines of "all the memory now sits in that one thread that collects and frees the information from many other threads, and all other threads have run out of memory")

来源：https://stackoverflow.com/questions/46355935/dynamic-lock-free-memory-allocators

标签

c++

Linux

algorithm

lock-free