I understand mipmapping pretty well. What I do not understand (on a hardware/driver level) is how mipmapping improves the performance of an application (at least this is oft
Mipmaps are useful at least for two reasons:
see info here: http://www.tomshardware.com/reviews/ati,819-2.html
You are no doubt aware that each texel in the lower LODs of the mip-chain covers a higher percentage of the total texture image area, correct?
When you sample a texture at a distant location the hardware will use a lower LOD. When this happens, the sample neighborhood necessary to resolve minification becomes smaller, so fewer (uncached) fetches are necessary. It is all about the amount of memory that actually has to be fetched during texture sampling, and not the amount of memory occupied (assuming you are not running into texture thrashing).
I think this probably deserves a visual representation, so I will borrow the following diagram from the excellent series of tutorials at arcsynthesis.org.
On the left, you see what happens when you naïvely sample at a single LOD all of the time (this diagram is showing linear minification filtering, by the way) and on the right you see what happens with mipmapping. Not only does it improve image quality by more closely matching the fragment's effective size, but because the number of texels in lower mipmap LODs are fewer it can be cached much more efficiently.