Have been reading Agner Fog\'s \"The microarchitecture of Intel, AMD and VIA CPUs\" and on page 34 he describes \"return address prediction\":
http://www.agner.org/o
Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn't even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / target of the branch faster. A return instruction is a branch, and so it would normally have a branch predictor entry to determine whether it is taken and where the target is. The return address predictor is consulted in lieu of the normal branch target buffer.
So perhaps 50 instructions before the return statement is actually "executed", the fetch stage predicts a return instruction and the instruction address to fetch next. Later, when the return is executed, the address is read from the stack and compared with where the return was predicted to go. If they are the same, execution continues, else execution is rolled back to use the proper return address.
Why store on the stack? First, the processor does not know if the predictor has worked without comparing against the address stored on the stack. Second, the stack is the "official" return address, which might be changed for legitimate reasons. Third, the return address predictor has a limited number of entries. The stack is needed for the return instructions for which there was not room to store the addresses in the predictor.
On top of Brians' great explanation, there's the fact that the stack resides in memory. You don't want to have to go to the memory unit and do a memory lookup (not to mention address translation into physical), from some stack address everytime you want to predict the outcome of a branch. The branch prediction wants to be self-sufficient. You can also view the RSB as another form of caching data.