Why is this OpenGL ES code slow on iPhone?

后端 未结 5 804
梦如初夏
梦如初夏 2021-02-04 19:40

I\'ve slightly modified the iPhone SDK\'s GLSprite example while learning OpenGL ES and it turns out to be quite slow. Even in the simulator (on the hw worst) so I must be doing

相关标签:
5条回答
  • 2021-02-04 19:55

    (I know this is very late, but I couldn't resist. I'll post anyway, in case other people come here looking for advice.)

    This has nothing to do with the texture size. I don't know why people rated up Nils. He seems to have a fundamental misunderstanding of the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and mapped onto that triangle. The opposite is true.

    Once the triangle has been mapped into the viewport, it is rasterized. For every on-screen pixel the your triangle covers, the fragment shader is called. The default fragment shader (OpenGL ES 1.1, which you are using) will lookup the texel that most closely maps (GL_NEAREST) to the pixel you are drawing. It might look up 4 texels since you are using the higher quality GL_LINEAR method to average the best texel. Still, if the pixel count in your triangle is, say 100, then the most texture bytes you will have to read is 4(lookups) * 100(pixels) * 4(bytes per color. Far far less than what Nils was saying. It's amazing that he can make it sound like he actually knows what he's talking about.

    WRT the tiled architecture, this is common in embedded OpenGL devices to preserve locality of reference. I believe that each tile gets exposed to each drawing operation, quickly culling most of them. Then the tile decides what to draw on itself. This is going to be much slower when you have blending turned on, as you do. Because you are using large triangles that might overlap and blend with other tiles, the GPU has to do a lot of extra work. If, instead of rendering the example square with alpha edges, you were to render an actual shape (instead of a square picture of the shape), then you could turn off blending for this part of the scene and I bet that would speed things up tremendously.

    If you want to try it, just turn off blending and see how much things speed up, even if the don't look right. glDisable(GL_BLEND);

    0 讨论(0)
  • 2021-02-04 19:56

    I'm not familiar with the iPhone, but if it doesn't have dedicated hardware for handling floating point numbers (I suspect it doesn't) then it'd be faster to use integers whenever possible.

    I'm currently developing for Android (which uses OpenGL ES as well) and for instance my vertex array is int instead of float. I can't say how much of a difference it makes, but I guess it's worth a try.

    0 讨论(0)
  • 2021-02-04 20:03

    Your texture is 512*512*4 bytes per pixel. That's a megabyte of data. If you render it 200 times per frame you generate a bandwidth load of 200 megabytes per frame.

    With roughly 4 fps you consume 800mb/second just for texture reads alone. Frame- and Zbuffer writes need bandwidth as well. Then there is the CPU, and don't underestimate the bandwidth requirements of the display as well.

    RAM on embedded systems (e.g. your iphone) is not as fast as on a Desktop-PC. What you see here is a bandwidth starvation effect. The RAM simply can't handle the data faster.

    How to cure this problem:

    • pick a sane texture-size. On average you should have 1 texel per pixel. This gives crisp looking textures. I know - it's not always possible. Use common sense.

    • use mipmaps. This takes up 33% of extra space but allows the graphic chip to pick use a lower resolution mipmap if possible.

    • Try smaller texture formats. Maybe you can use the ARGB4444 format. This would double the rendering speed. Also take a look at the compressed texture formats. Decompression does not cause a performance drop as it's done in hardware. Infact the opposite is true: Due to the smaller size in memory the graphic chip can read the texture-data faster.

    0 讨论(0)
  • 2021-02-04 20:08

    Apple is very tight-lipped about the specific hardware specs of the iPhone, which seems very strange to those of us coming from a console background. But people have been able to determine that the CPU is a 32-bit RISC ARM1176JZF. The good news is that it have a full floating-point unit, so we can continue writing math and physics code the way we do in most platforms.

    http://gamesfromwithin.com/?p=239

    0 讨论(0)
  • 2021-02-04 20:10

    I guess my first try was just a bad (or very good) test. iPhone has a PowerVR MBX Lite which has a tile based graphics processor. It subdivides the screen into smaller tiles and renders them parallel. Now in the first case above the subdivision might got a bit exhausted because of the very high overlapping. More over, they couldn't be clipped because of the same distance and so all texture coordinates had to calculated (This could be easily tested by changing the translation in the loop). Also because of the overlapping the parallelism couldn't be exploited and some tiles were sitting doing nothing and the rest (1/3) were working a lot.

    So I think, while memory bandwidth could be a bottleneck, this wasn't the case in this example. The problem is more because of how the graphics HW works and the setup of the test.

    0 讨论(0)
提交回复
热议问题