branch-prediction | 易学教程

How to deal with branch prediction when using a switch case in CPU emulation

阅读更多关于 How to deal with branch prediction when using a switch case in CPU emulation

I recently read the question here Why is it faster to process a sorted array than an unsorted array? and found the answer to be absolutely fascinating and it has completely changed my outlook on programming when dealing with branches that are based on Data. I currently have a fairly basic, but fully functioning interpreted Intel 8080 Emulator written in C, the heart of the operation is a 256 long switch-case table for handling each opcode. My initial thought was this would obviously be the fastest method of working as opcode encoding isn't consistent throughout the 8080 instruction set and

Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

阅读更多关于 Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

问题 I'm writing some code in Java where, at some point, the flow of the program is determined by whether two int variables, "a" and "b", are non-zero (note: a and b are never negative, and never within integer overflow range). I can evaluate it with if (a != 0 && b != 0) { /* Some code */ } Or alternatively if (a*b != 0) { /* Some code */ } Because I expect that piece of code to run millions of times per run, I was wondering which one would be faster. I did the experiment by comparing them on a

Intel CPUs Instruction Queue provides static branch prediction?

阅读更多关于 Intel CPUs Instruction Queue provides static branch prediction?

In Volume 3 of the Intel Manuals it contains the description of a hardware event counter: BACLEAR_FORCE_IQ Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ is also responsible for providing conditional branch prediction direction based on a static scheme and dynamic data provided by the L2 Branch Prediction Unit. If the conditional branch target is not found in the Target Array and the IQ predicts that the branch is taken, then the IQ will force the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by the BAC generates approximately an 8 cycle

How prevalent is branch prediction on current CPUs?

阅读更多关于 How prevalent is branch prediction on current CPUs?

Due to the huge impact on performance, I never wonder if my current day desktop CPU has branch prediction. Of course it does. But how about the various ARM offerings? Does iPhone or android phones have branch prediction? The older Nintendo DS? How about PowerPC based Wii? PS 3? Whether they have a complex prediction unit is not so important, but if they have at least some dynamic prediction, and whether they do some execution of instructions following an expected branch. What is the cutoff for CPUs with branch prediction? A hand held calculator from decades ago obviously doesn't have one,

Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

阅读更多关于 Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

In the latest Intel software dev manual it describes two opcode prefixes: Group 2 > Branch Hints 0x2E: Branch Not Taken 0x3E: Branch Taken These allow for explicit branch prediction of Jump instructions (opcodes like Jxx ) I remember reading a couple of years ago that on x86 explicit branch prediction was essentially a no-op in the context of gccs branch prediciton intrinsics. I am now unclear if these x86 branch hints are a new feature or whether they are essentially no-ops in practice. Can anyone clear this up? (That is: Does gccs branch prediction functions generate these x86 branch hints?

Can I measure branch-prediction failures on a modern Intel Core CPU?

阅读更多关于 Can I measure branch-prediction failures on a modern Intel Core CPU?

This question and its answer, which was recently tagged as an Epic Answer, has prompted me to wonder; Can I measure the performance of a running application in Windows in terms of its CPU branch prediction failures? I know that some static analysis tools exist, that might help with optimizing code for good performance in branch-prediction situations, and that manual techniques could help by simply making changes and re-testing, but I'm looking for some automatic mechanism that can report a total number of branch prediction failures, over a period of time, as a Windows application runs, and I'm

Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

阅读更多关于 Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

I'm writing some code in Java where, at some point, the flow of the program is determined by whether two int variables, "a" and "b", are non-zero (note: a and b are never negative, and never within integer overflow range). I can evaluate it with if (a != 0 && b != 0) { /* Some code */ } Or alternatively if (a*b != 0) { /* Some code */ } Because I expect that piece of code to run millions of times per run, I was wondering which one would be faster. I did the experiment by comparing them on a huge randomly generated array, and I was also curious to see how the sparsity of the array (fraction of

Performance of “conditional call” on amd64

阅读更多关于 Performance of “conditional call” on amd64

When considering a conditional function call in a critical section of code I found that both gcc and clang will branch around the call. For example, for the following (admittedly trivial) code: int32_t __attribute__((noinline)) negate(int32_t num) { return -num; } int32_t f(int32_t num) { int32_t x = num < 0 ? negate(num) : num; return 2*x + 1; } Both GCC and clang compile to essentially the following: .global _f _f: cmp edi, 0 jg after_call call _negate after_call: lea rax, [rax*2+1] ret This got me thinking: what if x86 had a conditional call instruction like ARM? Imagine if there was such

Why did Intel change the static branch prediction mechanism over these years?

阅读更多关于 Why did Intel change the static branch prediction mechanism over these years?

From here I know Intel implemented several static branch prediction mechanisms these years: 80486 age: Always-not-taken Pentium4 age: Backwards Taken/Forwards Not-Taken Newer CPUs like Ivy Bridge, Haswell have become increasingly intangible, see Matt G's experiment here . And Intel seems don't want to talk about it any more, because the latest material I found within Intel Document was written about ten years ago. I know static branch prediction is (far?) less important than dynamic, but in quite a few situations, CPU will be completely lost and programmers(with compiler) are usually the best

Why did Intel change the static branch prediction mechanism over these years?

阅读更多关于 Why did Intel change the static branch prediction mechanism over these years?

问题 From here I know Intel implemented several static branch prediction mechanisms these years: 80486 age: Always-not-taken Pentium4 age: Backwards Taken/Forwards Not-Taken Newer CPUs like Ivy Bridge, Haswell have become increasingly intangible, see Matt G's experiment here. And Intel seems don't want to talk about it any more, because the latest material I found within Intel Document was written about ten years ago. I know static branch prediction is (far?) less important than dynamic, but in