What is the basic premise behind technology such as is found in Oblivion (and other games, I\'m sure; haven\'t played enough to know), wherein objects from afar are vaguely show
A few techniques, including the ones you've mentioned:
The idea is to throw away as much detail as is possible before users start to notice.
There are a number of pre-existing toolkits that will do these things automatically, but they tend to cost money. If you're looking at a serious app, I would recommend at least researching the solutions they include.
For 3d models, the technique is called level of detail. Essentially, multiple versions models are kept available the the right one use based on context. It is not always just for distance, it can also be used to maintiain framerate in other situations.
Be careful though, you should have mipmapping enabled or you'll get sparkling on the larger textures on the lower res models, and be careful of animation. Switching the model beneath an animating skeleton is tricky so one technique is to maintain the same skeleton even when LOD-ing the model.
There are dynamic LOD systems for both terrain and for object models, but these can be CPU heavy.
Low-detail versions of models work especially well for vehicles and objects It's really tricky with terrain. I've worked on battlezone-type games and PC flight simulators. A little fog helps cover up the "pops" between details.
At the lower details, texture-mapping is often replaced with a few single-color polygons.
"Another more obvious technique is to store really rough 3D models, but then how does the 3D rendering system specifically choose to render the rough model of the building and not the rough models of other less significant (and probably not-viewable-from-that-distance) objects? How would you store something like that along with your heightmap?"
You can grid your terrain and keep a list of which objects are visible in each grid cell.
One way to handle terrain is to have multi-resolution tiles. Much like how Virtual Earth and Google Maps do it, they divide the world into tiles in a recursive fashion. So, there are 4 tiles at level 0, 16 at level 1, etc... Then, using some LoD algorithm to determine the zoom/scale, you would load the appropriate terrain tiles for the given area.
One technique is LOD (level of detail). The farther an object is from the camera, the less triangles are drawn. Here's a link: http://www.stefan-krause.com/
You can render to textures for distant objects however you have to re-render the texture every time the perspective changes beyond some threshold. It works great for distant objects where the perspective won't change that often like your example of the mountains in the distance however if the the view source is moving too fast or you are too close then you will get a fish eye effect kind of like the early days of sky rendering in quake. This kind of system lends it's self to worlds like eve online which contain vast distances.
This of course is but another trick in your arsenal and you will still need LOD to some extent.