branch-prediction | 易学教程

Avoid stalling pipeline by calculating conditional early

阅读更多关于 Avoid stalling pipeline by calculating conditional early

问题 When talking about the performance of ifs, we usually talk about how mispredictions can stall the pipeline. The recommended solutions I see are: Trust the branch predictor for conditions that usually have one result; or Avoid branching with a little bit of bit-magic if reasonably possible; or Conditional moves where possible. What I couldn't find was whether or not we can calculate the condition early to help where possible. So, instead of: ... work if (a > b) { ... more work } Do something

Does a branch misprediction flush the entire pipeline, even for very short if-statement body?

阅读更多关于 Does a branch misprediction flush the entire pipeline, even for very short if-statement body?

问题 Everything I've read seems to indicate that a branch misprediction always results in the entire pipeline being flushed, which means a lot of wasted cycles. I never hear anyone mention any exceptions for short if-conditions. This seems like it would be really wasteful in some cases. For example, suppose you have a lone if-statement with a very simple body that is compiled down to 1 CPU instruction. The if-clause would be compiled into a conditional jump forward by one instruction. If the CPU

Can I use GCC's __builtin_expect() with ternary operator in C

阅读更多关于 Can I use GCC's __builtin_expect() with ternary operator in C

问题 The GCC manual only shows examples where __builtin_expect() is placed around the entire condition of an 'if' statement. I also noticed that GCC does not complain if I use it, for example, with a ternary operator, or in any arbitrary integral expression for that matter, even one that is not used in a branching context. So, I wonder what the underlying constraints of its usage actually are. Will it retain its effect when used in a ternary operation like this: int foo(int i) { return __builtin

Branch mispredictions

阅读更多关于 Branch mispredictions

问题 This question may be silly but i will ask it anyway. I've heard about branch prediction from this Mysticial's answer and i want to know if it is possible for the following to happen Lets say i have this piece of C++ code while(memoryAddress = getNextAddress()){ if(haveAccess(memoryAddress)) // change the value of *memoryAdrress else // do something else } So if the branch predictor predicts wrongly in some case that the if statement is true and then the program change the value of

Why predict a branch, instead of simply executing both in parallel?

阅读更多关于 Why predict a branch, instead of simply executing both in parallel?

问题 I believe that when creating CPUs, branch prediction is a major slow down when the wrong branch is chosen. So why do CPU designers choose a branch instead of simply executing both branches, then cutting one off once you know for sure which one was chosen? I realize that this could only go 2 or 3 branches deep within a short number of instructions or the number of parallel stages would get ridiculously large, so at some point you would still need some branch prediction since you definitely

Can branch prediction cause illegal instruction?

阅读更多关于 Can branch prediction cause illegal instruction?

问题 In the following pseudo-code: if (rdtscp supported by hardware) { Invoke "rdtscp" instruction } else { Invoke "rdtsc" instruction } Let's say the CPU does not support the rdtscp instruction and so we fallback to the else statement. If CPU mispredicts the branch, is it possible for the instruction pipeline to try to execute rdtscp and throw an Illgal Instruction error? 回答1: It is explicitly documented for the #UD trap (Invalid Opcode Execution) in the Intel Processor Manuals, Volume 3A,

Can branch prediction crash my program?

阅读更多关于 Can branch prediction crash my program?

问题 Going trough chapter 3 of this book called Computer Systems Architecture: A programmer's perspective, it is stated that an implementation like testl %eax, %eax cmovne (%eax), %edx is invalid because if the prediction fails, then we'll have NULL dereferencing. It is also stated that we should use branching code. Still, wouldn't using conditional jumps lead to the same result? For example: .L1: jmp *%eax testl %eax, %eax jne .L1 Is it possible to trick gcc to output something like that for an

Performance of branch prediction in a loop

阅读更多关于 Performance of branch prediction in a loop

问题 Would there be any noticeable speed difference between these two snippets of code? Naively, I think the second snippet would be faster because branch instructions are encountered a lot less, but on the other hand the branch predictor should solve this problem. Or will it have a noticeable overhead despite the predictable pattern? Assume that no conditional move instruction is used. Snippet 1: for (int i = 0; i < 100; i++) { if (a == 3) output[i] = 1; else output[i] = 0; } Snippet 2: if (a ==

Why predict a branch, instead of simply executing both in parallel?

阅读更多关于 Why predict a branch, instead of simply executing both in parallel?

I believe that when creating CPUs, branch prediction is a major slow down when the wrong branch is chosen. So why do CPU designers choose a branch instead of simply executing both branches, then cutting one off once you know for sure which one was chosen? I realize that this could only go 2 or 3 branches deep within a short number of instructions or the number of parallel stages would get ridiculously large, so at some point you would still need some branch prediction since you definitely will run across larger branches, but wouldn't a couple stages like this make sense? Seems to me like it

Does a branch misprediction flush the entire pipeline, even for very short if-statement body?

阅读更多关于 Does a branch misprediction flush the entire pipeline, even for very short if-statement body?

Everything I've read seems to indicate that a branch misprediction always results in the entire pipeline being flushed, which means a lot of wasted cycles. I never hear anyone mention any exceptions for short if-conditions. This seems like it would be really wasteful in some cases. For example, suppose you have a lone if-statement with a very simple body that is compiled down to 1 CPU instruction. The if-clause would be compiled into a conditional jump forward by one instruction. If the CPU predicts the branch to not be taken, then it will begin executing the if-body instruction, and can