I have rendered a 3D scene in OpenGL viewed from the gluOrtho perspective. In my application I am looking at the front face of a cube of volume 100x70x60mm (which I have as 1000
This can't be done in the context of OpenGL--you'll need some sort of scene graph or other space partitioning scheme working in concert with your application's data structures.
The reason is simple: the frame buffer only stores the color and depth of the fragment nearest to the eye at each pixel location (assuming a normal GL_LESS depth function). The depth value stored in the Z-buffer is used to determine if each subsequent fragment is closer or farther from the eye than the existing fragment, and thus whether the new fragment should replace the old or not. The frame buffer only stores color and depth values from the most recent winner of the depth test, not the entire set of fragments that would have mapped to that pixel location. Indeed, there would be no way to bound the amount of graphics memory required if that were the case.
You're not the first to fall for this misconception, so I say it the most blunt way possible: OpenGL doesn't work that way. OpenGL never(!) deals with objects or any complex scenes. The only thing OpenGL knows about are framebuffers, shaders and single triangles. Whenever you draw an object, usually composed of triangles, OpenGL will only see each triangle at a time. And once something has been drawn to the framebuffer, whatever has been there before is lost.
There are algorithms based on the concepts of rasterizers (like OpenGL is) that decompose a rendered scene into it's parts, depth peeling would be one of them.