branch-prediction

How prevalent is branch prediction on current CPUs?

时光怂恿深爱的人放手 提交于 2019-12-31 11:33:52
问题 Due to the huge impact on performance, I never wonder if my current day desktop CPU has branch prediction. Of course it does. But how about the various ARM offerings? Does iPhone or android phones have branch prediction? The older Nintendo DS? How about PowerPC based Wii? PS 3? Whether they have a complex prediction unit is not so important, but if they have at least some dynamic prediction, and whether they do some execution of instructions following an expected branch. What is the cutoff

Is that true if we can always fill the delay slot there is no need for branch prediction?

末鹿安然 提交于 2019-12-31 02:54:06
问题 I'm looking at the five stages MIPS pipeline (ID,IF,EXE,MEM,WB) in H&P 3rd ed. and it seems to me that the branch decision is resolved at the stage of ID so that while the branch instruction reaches its EXE stage, the second instruction after the branch can be executed correctly (can be fetched). But this leaves us the problem of possibly still wasting the 1st instruction soon after the branch instruction. I also encountered the concept of branch delay slot, which means you want to fill the

How does branch prediction interact with the instruction pointer

荒凉一梦 提交于 2019-12-23 16:30:27
问题 It's my understanding that at the beginning of a processor's pipeline, the instruction pointer (which points to the address of the next instruction to execute) is updated by the branch predictor after fetching, so that this new address can then be fetched on the next cycle. However, if the instruction pointer is modified early on in the pipeline, wouldn't this affect instructions currently in the execute phase that might rely on the old instruction pointer value? For instance, when doing a

BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

被刻印的时光 ゝ 提交于 2019-12-22 06:55:59
问题 Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors? 回答1: Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/ BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf 3.7 Branch prediction in Intel Sandy Bridge and Ivy

What is the overhead of using Intel Last Branch Record?

佐手、 提交于 2019-12-20 10:44:11
问题 Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. http://css.csail.mit.edu/6.858/2012/readings/ia32/ia32-3b.pdf document has more information in case you are interested. a) Can someone give an idea of how much LBR slows down program execution of common programs - both CPU and IO intensive ? b) Will branch prediction be turned OFF when LBR tracing is ON ? 回答1: The paper Intel Code Execution

The inner workings of Spectre (v2)

谁说胖子不能爱 提交于 2019-12-19 08:12:35
问题 I have done some reading about Spectre v2 and obviously you get the non technical explanations. Peter Cordes has a more in-depth explanation but it doesn't fully address a few details. Note: I have never performed a Spectre v2 attack so I do not have hands on experience. I have only read up about about the theory. My understanding of Spectre v2 is that you make an indirect branch mispredict for instance if (input < data.size) . If the Indirect Target Array (which I'm not too sure of the

The inner workings of Spectre (v2)

删除回忆录丶 提交于 2019-12-19 08:12:08
问题 I have done some reading about Spectre v2 and obviously you get the non technical explanations. Peter Cordes has a more in-depth explanation but it doesn't fully address a few details. Note: I have never performed a Spectre v2 attack so I do not have hands on experience. I have only read up about about the theory. My understanding of Spectre v2 is that you make an indirect branch mispredict for instance if (input < data.size) . If the Indirect Target Array (which I'm not too sure of the

When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

泪湿孤枕 提交于 2019-12-17 22:35:52
问题 I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams. However the performance with Streams is always turning out to be worse than traditional loops. int totalSize = 32768; int filterValue = 1280; int[] array = new int[totalSize]; Random rnd = new Random(0); int loopCount = 10000; for (int i = 0; i < totalSize; i++) { // array[i] = rnd.nextInt() % 2560; // Unsorted Data array[i] = i; // Sorted Data } long start = System.nanoTime(); long sum = 0; for (int j =

Branch prediction in a java for loop

空扰寡人 提交于 2019-12-14 00:33:08
问题 I saw this comment next to a if condition: // branch prediction favors most often used condition in the source code of the JavaFX SkinBase class. protected double computeMinWidth(double height, double topInset, double rightInset, double bottomInset, double leftInset) { double minX = 0; double maxX = 0; boolean firstManagedChild = true; for (int i = 0; i < children.size(); i++) { Node node = children.get(i); if (node.isManaged()) { final double x = node.getLayoutBounds().getMinX() + node

C# reinterpret bool as byte/int (branch-free)

∥☆過路亽.° 提交于 2019-12-13 03:55:20
问题 Is it possible in C# to turn a bool into a byte or int (or any integral type, really) without branching ? In other words, this is not good enough: var myInt = myBool ? 1 : 0; We might say we want to reinterpret a bool as the underlying byte , preferably in as few instructions as possible. The purpose is to avoid branch prediction fails as seen here. 回答1: Here is a solution that takes more lines (and presumably more instructions) than I would like, but that actually solves the problem directly