I would like to use shared memory between several processes, and would like to be able to keep using raw pointers (and stl containers).
For this purpose, I am using sh
This is a hard problem. If you are forking a single program to create children, and only the parent and the children will use the memory segment, just be sure to map it before you fork. The children will automatically inherit the mapping from their parent and there's no need to use a fixed address.
If you aren't, then the first thing to consider is whether you really need to use raw STL containers instead of the boost interprocess containers. That you're already using boost interprocess to allocate the shared memory segment suggests you don't have any problem using boost, so the only advantage I can think of to using STL containers would be so you don't have to port existing code. Keep in mind that for it to work with fixed addresses, the containers and what they contain pointers to (assuming you're working with containers of pointers) will need to be kept in the shared memory space.
If you're certain that it's what you want, you'll have to figure out some method for them to negotiate an address. Keep in mind that the OS is allowed to reject your desired fixed memory address. It will reject an address if the page at that address has already been mapped into memory or allocated. Because different programs will have allocated different amounts of memory at different times, which pages are available and which are unavailable will vary across your programs.
So you need for the programs to gain consensus on a memory address. This means that several addresses might have to be tried and rejected. If it's possible that sometime after startup a new program will become interested, the search for consensus will have to start over again. The algorithm would look something like this:
To come up with what addresses A should propose, you could have A map a non-fixed memory segment, see what address it's mapped at, and propose that address. If it's unsatisfactory, map another segment and propose it instead. You will need to unmap the segments at some point, but you can't unmap them right away because if you unmap then remap a segment of the same size chances are the OS will give you the same address back over and over. Keep in mind that you may never reach consensus; there's no guarantee that there's a large enough segment at a common location across all the processes. This could happen if your programs all independently use almost all memory, say if they are backed up by a ton of swap (though if you care enough about performance to use shared memory hopefully you are avoiding swap).
All of the above assumes you're in a relatively constrained address space. If you're on 64-bit, this could work. Most computers' RAM + swap will be far less than what's allowed by 64-bits, so you could put map the memory at a very far out fixed address that all processes are unlikely to have mapped already. I suggest at least 2^48, since current 64-bit x86 processors don't each beyond that range (despite pointers being 64-bits, you can only plug in as much RAM as allowed by 48-bits, still a ton at the time of this writing). Although there's no reason a smart heap allocator couldn't take advantage of the vastness of the address space to reduce its bookkeeping work, so to be truly robust you would still need to build consensus. Keep in mind that you will at least want the address to be configurable -- even if we don't have that much memory anytime soon, between now and then someone else might have the same idea and pick your address.
To do the bidirectional communication you could use any of sockets, pipes, or another shared memory segment. Your OS may provide other forms of IPC. But strongly consider that you are probably now introducing more complexity than you would have to deal with if you just used the boost interprocess containers ;)