Does interleaving in VBOs speed up performance when using VAOs

前端 未结 2 1843
一个人的身影
一个人的身影 2020-12-09 11:29

You usually get a speed up when you use interleaved VBOs instead of using multiple VBOs. Is this also valid when using VAOs?

Because it\'s much more convenient to ha

相关标签:
2条回答
  • 2020-12-09 11:34

    A VAO doesn't hold any vertex attribute data. It's a container object for a set of vertex arrays which describe how to pull data from zero, one or multiple buffer objects (these are the actual vertex arrays you define with VertexAtrribPointer()(pre-GL43) or VertexAttribFormat(), VertexAttribBinding() and BindVertexBuffer() (GL43+)), enable states for said vertex arrays and possibly an ELEMENT_ARRAY_BUFFER_BINDING. See tables 23.3 and 23.4 of the GL 4.4 core specification for details.

    The ARRAY_BUFFER_BINDING is recorded separately for each vertex array, i.e. each VertexAttribPointer() invocation per attribute index. This way you can associate a an attribute index of the VAO with multiple buffer objects and switch between which buffers to pull from using {Enable|Disable}VertexAttribArray() or by distributing buffers across attrib indices and choosing appropriate attrib locations for your shaders - either with glBindAttribLocation() or using explicit attrib locations inside your shader (the latter is superior).

    Why all this blabbering about VAOs? Because there is no detrimental effect of using VAOs and the layout of a VBOs buffer store and how quickly vertices are pulled has nothing to do with VAOs. VAOs are state containers, nothing more, nothing less. You still need buffer storage to back any vertex pulling, you can interleave your data just like you did without VAOs. All you need to do is reflect the interleaved memory layout with your vertex arrays. So in essence, except for recording vertex array state, nothing changes.

    What you gain by using VAOs is a way to more or less quickly switch between sets of state and associated buffer objects without setting up vertex arrays everytime you switch a buffer object. You therefore save API calls. Also, when binding a VAO, each vertex array still has its ARRAY_BUFFER_BINDING and there is no need to call BindBuffer() again thus saving further API calls. That's it.

    You don't gain nor do you lose anything in regards to vertex pulling performance because of a VAO, at least not in theory. You do, however, lose overall performance when you inconsiderately switch VAOs around like crazy.

    BTW, using VAOs is also mandatory when using GL32 and higher core contexts so your question is moot if you're not going for compat.

    In general, when you're unsure about performance: don't guess, always profile! That's especially true when using OpenGL.

    0 讨论(0)
  • 2020-12-09 11:45

    VAOs

    • For sharing larger data sets, a dedicated buffer containing a single vertex (attrib) array is surely a way to go, while one could still interleave specific arrays in another buffer and combine them using a VAO.

    • A VAO handles the binding of all those buffers and the vertex (attrib) array states such as array buffer bindings and attrib entries with (buffer) pointers and enable/disable flags. Aside from its convenience, it is designed for doing this job quickly, not to mention the simple API call, which changes all states at once, without the tedious enabling and disabling of attrib arrays. It basically does, what we had to do manually before. However, with my own VAO-like implementation, I could not measure any performance loss, even when doing lots of binds. From my point of view, the major advantage is its convenience.

    So, a VAO doesn't decide on drawing performance in terms of glDraw*, but it can have an impact on the overhead of state changes.

    Interleaved data formats...

    • ...cause less GPU cache pressure, because the vertex coordinate and attributes of a single vertex aren't scattered all over in memory. They fit consecutively into few cache lines, whereas scattered attributes could cause more cache updates and therefore evictions. The worst case scenario could be one (attribute) element per cache line at a time because of distant memory locations, while vertices get pulled in a non-deterministic/non-contiguous manner, where possibly no prediction and prefetching kicks in. GPUs are very similar to CPUs in this matter.

    • ...are also very useful for various external formats, which satisfy the deprecated interleaved formats, where datasets of compatible data sources can be read straight into mapped GPU memory. I ended up re-implementing these interleaved formats with the current API for exactly those reasons.

    • ...should be layouted alignment friendly just like simple arrays. Mixing various data types with different size/alignment requirements may need padding to be GPU and CPU friendly. This is the only downside I know of, appart from the more difficult implementation.

    • ...do not prevent you from pointing to single attrib arrays in them for sharing.

    Interleaving will most probably improve draw performance.

    Conclusion:

    From what I experienced, it is best to have cleanly designed interfaces for vertex data sources and 'compiled' VAOs, where one can encapsulate the VAO factory appropriately. This factory can then be altered to initialize interleaved, separate or mixed vertex buffer layouts from data sources, without breaking anything. This is especially useful for profiling.

    After all that babbling, my advice is simple: Proper and sufficiently abstracted design before and for optimization.

    0 讨论(0)
提交回复
热议问题