OpenGL (ES 2.0) VBO Performances in a Shared Memory Architecture

I am a desktop GL developer, and I am starting to explore the mobile world.

To avoid misunderstandings, or welcome-but-trivial replies, I can humbly say that I am pretty well aware of the GL and GL|ES machinery.

The short question is: if we are using GL|ES 2.0 in a shared memory architecture, what's the point behind using VBOs against client-side arrays?

More in detail:

Vertex buffers are raw chunks of memory, the driver cannot in any way optimize anything because the access pattern depends on: 1) how the application configures vertex data layout, 2) how a vertex shader consumes buffer content, and 3) we can have lots of vertex shaders operating in different ways, and differently sourcing the same buffer.
Alignment: individual VBO storage could start at addresses that are optimal to the underlying GL system; what if I just force (e.g, respect alignment best practices) client-side arrays allocation to these boundaries?
Tile-Based Rendering vs. Immediate Mode architectures should not come into play: to my understanding, this is not related to my question (i.e., memory access).

I understand that using VBOs can have your code run better/faster in future platforms/hardware without modifying it, but this is not the focus of this question.

Alongside, I also realize that using VBOs in a shared memory architecture doubles memory usage (if you, for some reason, have to keep vertex data at your disposal), and it costs you a memcpy of the data.

As with interleaved vertex arrays, VBO usage has got a great "hype" in developers' forums/blogs/official_technotes without any data supporting those statements (i.e., benchmarks).

Is VBO usage worth it on shared memory architectures?
Do client-side arrays work well?
What do you think/know about this?

I can report that using of VBOs to store vertex data on Android devices gave me zero performance improvement. Tried it on Adreno, Mali400 and PowerVR GPUs. However, we use VBOs considering that it is the best practice for OpenGL ES.

You can find notes about this in our article (Vertex Buffer Objects paragraph).

According to this report, even holding SMA constant it depends on both the OpenGL implementation (some VBO work is secretly done on the CPU) and the size of the VBOs:

http://sarofax.wordpress.com/2011/07/10/vbo-vertex-buffer-object-limitations-on-ios-devices/

I will tell you, what i know about iOS platform. VBO does really improve your performance.

VBO is perfect, if you have a static geometry - once copied, no additional overhead on every draw call. CA will do copy your data from client memory to "gpu memory" every drawcall. It may do data realigning, if you forgot about it.
VBO can be mapped to gpu vie glMapBuffer - it is an asynchronous operation, meaning, it almost has no overhead, but you should remember - when you are mapping\unmapping your buffer, it's better to use it 2 frames after unmap operation - to avoid synchronization
Apple engineers proclaim, that VBO will have better performance, than CA on SGX hardware, even if you'll reupload it every frame - i don't know the details.
VBO is a best practice. CA are deprecated. Better keep in pace with modern trends and stay as much cross-platform, as possible

来源：https://stackoverflow.com/questions/13060299/opengl-es-2-0-vbo-performances-in-a-shared-memory-architecture

标签

memory

mobile

opengl-es

gpu

vbo