Multi-thread rendering vs command pools

After all, being able to build command buffers in parallel is one of the selling points of Vulkan.

Specs (5.1 Command Pools) (emphasis mine):

Command pools are application-synchronized, meaning that a command pool must not be used concurrently in multiple threads. That includes use via recording commands on any command buffers allocated from the pool, as well as operations that allocate, free, and reset command buffers or the pool itself.

Doesn't this kind of kill the whole purpose of command pools when it comes to recording in parallel? If you intend to record in parallel, then you would better be off having a separate pool for each thread, isn't that right?

I would understand it if if you pre-record command buffers allocated all from the same pool (in one thread) and then execute them in parallel. That has the advantage of amortized resource creation costs as well as parallel execution. However, parallel recording and command pools don't seem to match very well.

I don't personally know why you wouldn't just pre-record everything. So why is building command buffers in parallel so needed? And would you then really have to use one pool per thread?

If you intend to record in parallel, then you would better be off having a separate pool for each thread, isn't that right?

I don't see how having a separate pool per thread "kills the whole purpose of command pools when it comes to recording in parallel". Indeed, it helps it quite a bit, since each thread can manage its own command pool as it sees fit.

Consider the structural difference between, say, a descriptor pool and a command pool. With a descriptor pool, you basically tell it exactly what you will allocate from it. VkDescriptorPoolCreateInfo provides detailed information which allows implementations to allocate up-front exactly how much memory you'll use for each pool. And you cannot allocate more than this from a descriptor pool.

By contrast, VkCommandPoolCreateInfo contains... nothing. Oh, you tell it if the command buffers can be primary or secondary. You say whether the command buffers will be frequently reset or persistent. And a couple of other things. But other than that, you say nothing about the contents of the command buffers. You don't even give it information on how many buffers you'll allocate.

Descriptor pools are intended to be fixed: allocated as needed, but up to a quantity set at construction time. Command buffers are intended to be very dynamic: allocated from as needed for your particular use cases.

Think of it as each pool having its own malloc/free. Since the user is forced to synchronize access to pools and their buffers, that means that every vkCmd* function is not required to do so when they allocate memory. That makes command building faster. That helps threading. When a thread decides to reset its command pool, it doesn't have to lock any mutexes or any other such stuff to do that.

There's nothing conceptually wrong with having one command pool per thread. Indeed, having two per thread (double-buffering) makes even more sense.

I don't personally know why you wouldn't just pre-record everything.

Because you're not making a static tech demo.

I guess this comes from lack of experience, but I imagined the parallel-recording would look like "threads 2-N record secondary command buffers, thread 1 calls all of them in one primary command buffer", in which case there is only one command buffer per thread. That was why I said it kills the purpose of command pools, because you are only making a single allocation per pool.

That's certainly a viable form of recording command buffers in parallel. But there are two things you've missed.

While that is certainly one form of parallel recording, it is not the only one. If you're doing deferred rendering, the thread that builds the CB for the lighting passes will be finished with its work much sooner than one of the threads that's responsible for (part of) the geometry pass. So a well-designed multithreaded system will have to apportion out work to threads based on need, not based on some fixed arrangement of stuff. So an individual thread will often end up building multiple command buffers.

And even if that were not the case, you forget about buffering. When it comes time to build the CBs for the next frame, you can't just overwrite the existing ones. After all, they're probably still in the queue doing work. So each thread will need at least two CBs; the one that's currently being executed and the one that's currently being built.

And even if that were not the case, command pools allocate all memory associated with a CB. There's a reason why I analogized them to malloc/free. Even if you only use a single CB with a particular pool, the fact that this CB's allocations (which can happen due to any vkCmd* function) never have to synchronize with another thread is a good thing.

So no, this does not in any way inhibit the ability to use multiple threads to build CBs.

If you intend to record in parallel, then you would better be off having a separate pool for each thread, isn't that right?

It is exactly right. That is what your spec quote implies.

I would understand it if if you pre-record command buffers allocated all from the same pool (in one thread) and then execute them in parallel.

Vulkan does one better. You can pre-record command buffers (allocated from per-thread pools) in parallel and then execute them in parallel too (if your workload is conducive to that).

I don't personally know why you wouldn't just pre-record everything. So why is building command buffers in parallel so needed?

Because it's hard (especially as your app grows in complexity). At some point even contra-productive (when you twist the CmBs to be pre-recordable - e.g. filling it with empty placeholder bindings from which 80 % of them won't be used).
It is not necessarily "needed", Vulkan just lets you choose what you deem is best for your App (or part of it).

来源：https://stackoverflow.com/questions/38318818/multi-thread-rendering-vs-command-pools

标签

vulkan