Why are conditionally executed instructions not present in later ARM instruction sets?

后端 未结 7 994
隐瞒了意图╮
隐瞒了意图╮ 2020-12-31 02:57

Naively, conditionally executed instructions seem like a great idea to me.

As I read more about ARM (and ARM-like) instruction sets (Thumb2, Unicore, AArch64) I find

相关标签:
7条回答
  • 2020-12-31 03:30

    One of the reasons is because of instruction encoding.

    In thumb, you cannot squeeze four more bits into the tight 16-bit space while there isn't even enough room for the 3 high bits of the register operands and they must be reduced to a subset of only 8 registers. Note that in thumb2 you have a separate IT(E) instruction for selecting the conditions for the next 4 instructions. You can't store the condition in the same instruction though, because of the reason stated above.

    For AArch64 the number of registers has been doubled compared to 32-bit ARM, but again you don't have any remaining bits for the new 3 high bits of the registers. If you want to use the old encoding then you must "borrow" either from the narrow 12-bit immediate or the 4-bit condition. 12 bits are already too small compared to other RISC architectures such as MIPS and reducing the number making everything worse, so removing the condition is a better choice. Because branch prediction has become more and more advanced, it won't be much a problem. It also makes implementing out-of-order execution easier because now there's one less thing to rename and care about

    0 讨论(0)
  • 2020-12-31 03:30

    It's somewhat misleading to say that conditional execution is not present in ARMv8. The issue is to understand why you don't want to execute some instructions. Perhaps in the early ARM days, the actual non-execution of instructions mattered (for power or whatever) but today the significance of this feature is that it allows you to avoid branches for small dumb jumps, for example code like a=(b>0? 1: 2). This sort of thing is more common than you might imagine --- conceptually it's things like MAX/MIN or ABS (though for some CPUs there may be instructions to do these particular tasks).

    In ARMv8, while there are not general conditionally executed instructions there are a few instructions that perform the specific task I am describing, namely allowing you to avoid branching for short dumb jumps; CSEL is the most obvious example, though there are other cases (e.g. conditional setting of conditions) to handle other common patterns (in that case the pattern of C short-circuited expression evaluation).

    IMHO what ARM has done here is what makes the most sense. They've extracted the feature of conditional execution that remains valuable on modern CPUs (avoid many branches) while changing the details of the implementation to match the micro-architecture of modern CPUs.

    0 讨论(0)
  • 2020-12-31 03:37

    Just like, the defer slot in mips being a trick (at the time), conditional execution in arm is a trick (at the time), as is the pc being two instructions ahead. Now down the road how much affect do they have? Will ARMs branch predictor actually make that much difference or is the real answer they needed more bits in a 32 bit instruction word and like thumb the first and easiest thing to get rid of is the condition bits.

    it is not too difficult to do some performance tests to see how good or back the branch predictor really is, I tried it with unconditional branches on an arm11, granted that is an old architecture now but still in wide use. It was difficult at best to get the branch prediction to show any improvement, and in no way, shape, or form could it compete with the conditional execution. I have not repeated these experience on anything in the cortex-a family.

    0 讨论(0)
  • 2020-12-31 03:38

    Conditional execution is a good choice in implementation of many auxiliary or bit-twiddling routines, such as sorting, list or tree manipulation, number to string conversion, sqrt or long division. We could add UART drivers and extracting bit fields in routers. Those have a high branch to non-branch ratio with somewhat high unpredictability too.

    However, once you get beyond the lowest level of services (or increase the abstraction level by using a higher level language), the code looks completely different: code blocks inside different branches of conditions consists more of moving data and calling sub-routines. Here the benefits of those extra 4 bits rapidly fade away. It's not only personal development but cultural: Culturally programming has grown from unstructured (Basic, Fortran, Assembler) towards structural. Different programming paradigms are supported better also in different instruction set architectures.

    A technological compromise could have been the possibility to compress the five bit 'cond.S' field to four or three most frequently used combinations.


    • A paper on profile guided mode selection, giving power, cycle time, code size and instruction count benchmarks for popular SA-110 thumb/ARM compiled routines. Some routines are better in ARM mode and other do better in Thumb. It depends on the algorithm and ultimately the code/compiler.
    0 讨论(0)
  • 2020-12-31 03:49

    On the old ARM v4, the conditional instructions only saved time if there was a high probability that they would end up getting executed, or if the probability was about 50%, then if there were just 2 to 4 of them in a row. If they weren't getting executed, then it was wasting cycles to have to fetch past them, versus the overhead of using a branch to get past them. If they were being executed, the branch would be fetched but not executed.

    A minor nuisance is that when debugging, placing a break on a conditional instruction always resulted in taking a break on that instruction, regardless of the condition (unless there's some really smart debugger that my company didn't have).

    0 讨论(0)
  • 2020-12-31 03:53

    General claim is modern systems have better branch predictors and compilers are much more advanced so their cost on instruction encoding space is not justified.

    This is from ARMv8 Instruction Set Overview

    The A64 instruction set does not include the concept of predicated or conditional execution. Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations.

    And it continues

    A very small set of “conditional data processing” instructions are provided. These instructions are unconditionally executed but use the condition flags as an extra input to the instruction. This set has been shown to be beneficial in situations where conditional branches predict poorly, or are otherwise inefficient.

    Another paper titled Trading Conditional Execution for More Registers on ARM Processors claims:

    ... conditional execution takes up precious instruction space as conditions are encoded into a 4-bit condition code selector on every 32-bit ARM instruction. Besides, only small percentages of instructions are actually conditionalized in modern embedded applications, and conditional execution might not even lead to performance improvement on modern embedded processors.

    0 讨论(0)
提交回复
热议问题