I\'ve been running some experiments with the openmp framework and found some odd results I\'m not sure I know how to explain.
My goal is to create this huge matrix a
You are right about vector::resize() internally calling malloc. Implementation-wise malloc is fairly complicated. I can see multiple places where malloc can lead to contention in a multi-threaded environment.
malloc probably keeps a global data structure in userspace to manage the user's heap address space. This global data structure would need to be protected against concurrent access and modification. Some allocators have optimizations to alleviate the number of times this global data structure is accessed... I don't know how far has Ubuntu come along.
malloc allocates address space. So when you actually begin to touch the allocated memory you would go through a "soft page fault" which is a page fault which allows the OS kernel to allocate the backing RAM for the allocated address space. This can be expensive because of the trip to the kernel and would require the kernel to take some global locks to access its own global RAM resource data structures.
the user space allocator probably keeps some allocated space to give out new allocations from. However, once those allocations run out the allocator would need to go back to the kernel and allocate some more address space from the kernel. This is also expensive and would require a trip to the kernel and the kernel taking some global locks to access its global address space management related data structures.
Bottomline, these interactions could be fairly complicated. If you are running into these bottlenecks I would suggest that you simply "pre-allocate" your memory. This would involve allocating it and then touching all of it (all from a single thread) so that you can use that memory later from all your threads without running into lock contention at user or kernel level.