I'm currently working with upgrading and restructuring an OpenGL render engine. The engine is used for visualising large scenes of architectural data (buildings with interior), and the amount of objects can become rather large. As is the case with any building, there is a lot of occluded objects within walls, and you naturally only see the objects that are in the same room as you, or the exterior if you are on the outside. This leaves a large number of objects that should be occluded through occlusion culling and frustum culling.
At the same time there is a lot of repetative geometry that can be batched in renderbatches, and also a lot of objects that can be rendered with instanced rendering.
The way I see it, it can be difficult to combine renderbatching and culling in an optimal fashion. If you batch too many objects in the same VBO it's difficult to cull the objects on the CPU in order to skip rendering that batch. At the same time if you skip the culling on the cpu, a lot of objects will be processed by the GPU while they are not visible. If you skip batching copletely in order to more easily cull on the CPU, there will be an unwanted high amount of render calls.
I have done some research into existing techniques and theories as to how these problems are solved in modern graphics, but I have not been able to find any concrete solution. An idea a colleague and me came up with was restricting batches to objects relatively close to eachother e.g all chairs in a room or within a radius of n meeters. This could be simplified and optimized through use of oct-trees.
Does anyone have any pointers to techniques used for scene managment, culling, batching etc in state of the art modern graphics engines?
There's lots of information about frustum and occlusion culling on the internet. Most of it comes from game developers. Here's a list of some articles that will get you started:
- http://de.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3
- http://de.slideshare.net/TiagoAlexSousa/secrets-of-cryengine-3-graphics-technology
- http://de.slideshare.net/Umbra3/siggraph-2011-occlusion-culling-in-alan-wake
- http://de.slideshare.net/Umbra3/visibility-optimization-for-games
- http://de.slideshare.net/Umbra3/chen-silvennoinen-tatarchuk-polygon-soup-worlds-siggraph-2011-advances-in-realtime-rendering-course
- http://de.slideshare.net/DICEStudio/culling-the-battlefield-data-oriented-design-in-practice
- http://www.cse.chalmers.se/~uffe/vfc.pdf
My (pretty fast) renderer works similar to this:
- Collection: Send all props, which you want to render, to the renderer.
- Frustum culling: The renderer culls the invisible props from the list using multiple threads in parallel.
- Occlusion culling: Now you could do occlusion culling on the CPU (I haven't implemented it yet, because I don't need it now). Detailed information on how to do it efficiently can be found in the Killzone and Crysis slides. One solution would be to read back the depth buffer of the previous frame from the GPU and then rasterize the bounding boxes of the objects on top of it to check if the object is visible.
- Splitting: Since you now know which objects actually have to be rendered, because they are visible, you have to split them by mesh, because each mesh has a different material or texture (otherwise they would be combined into a single mesh).
- Batching: Now you have a list of meshes to render. You can sort them:
- by depth (this can be done on the prop level instead of the mesh level), to save fillrate (I don't recommend doing this if your fragment shaders are very simple).
- by mesh (because there might be multiple instances of the same mesh and it would make it easy to add instancing).
- by texture, because texture switches are very costly.
- Rendering: Iterate through your partitioned meshes and render them.
And as "Full Frontal Nudity" already said: There's no perfect solution.
来源:https://stackoverflow.com/questions/17233172/state-of-the-art-culling-and-batching-techniques-in-rendering