Understanding branch prediction

前提是你 提交于 2019-12-01 11:07:50

I took my time reading the reference manual for the Cortex-A8: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/DDI0344K_cortex_a8_r3p2_trm.pdf

From section 5.1:

The processor contains program flow prediction hardware, also known as branch prediction. With program flow prediction disabled, all taken branches incur a 13-cycle penalty. With program flow prediction enabled, all mispredicted branches incur a 13-cycle penalty.

Basically this means that static branch prediction always assume branches to be false. This is different compared to PowerPC that have "special instructions" for hinting the processor about taken/not-taken branches (postfix +/-).

From section 1.3.1:

The instruction fetch unit predicts the instruction stream, fetches instructions from the L1 instruction cache, and places the fetched instructions into a buffer for consumption by the decode pipeline.

  1. Instruction Fetch, the first stage, makes the prediction.

From section 7.6.2:

An instruction can remain in the pipeline between being fetched and being executed. Because there can be several unresolved branches in the pipeline, instruction fetches are speculative, meaning there is no guarantee that they are executed. A branch or exceptional instruction in the code stream can cause a pipeline flush, discarding the currently fetched instructions. Fetches or instruction table walks that begin without an empty pipeline are marked speculative. If the pipeline contains any instruction up to the point of branch and exception resolution, then the pipeline is considered not empty.

I interpret this as nothing reaches the execution stage while a branch is being processed. If mispredition occurs, as discovered when executing a branch in Instruction Execute, all instructions in the pipeline are "flushed". They are never executed. That should answer question 2 and 4. Not so sure about how the "marking" is performed.

  1. I don´t know how it sends the signal. As far as I can tell the reference manual does not cover that part. Guess it´s magic.

(For the record I find the PowerPC reference manuals (e500/e600) I´m used to being much easier to understand because of the many instruction timing samples.)

I guess that there are many different mechanisms that are possible, but some quick answers:

  1. Branch prediction certainly needs to happen before the instructions are decoded, during the fetch stages. Otherwise, you're going to decode instructions that are not correct.
  2. You will normally give extra information with the branch instruction that was predicted, like the target that was predicted. The branch will be executed, and if the real target does not match the predicted target, you will need to flush the pipe.
  3. It really depends on the implementation. If the branch is executed, you can use the real target, like a branch that was not predicted.
  4. You certainly need a mechanism to recover, or wait for the branches to be resolved until you write the results. This will loose some time, but not as much as a branch that was not predicted.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!