How to deal with branch prediction when using a switch case in CPU emulation

前端 未结 4 1282
野趣味
野趣味 2021-02-02 11:59

I recently read the question here Why is it faster to process a sorted array than an unsorted array? and found the answer to be absolutely fascinating and it has completely chan

4条回答
  •  隐瞒了意图╮
    2021-02-02 12:41

    As the branches on your 256-way switch statement are densely packed the compiler will implement this as a jump table, so you're correct in that you'll trigger a single branch mispredict every time you pass through this code (as the indirect jump won't display any kind of predictable behaviour). The penalty associated with this will be around 15 clock cycles on a modern CPU (Sandy Bridge), or maybe up to 25 on older microarchitectures that lack a micro-op cache. A good reference for this sort of thing is "Software optimisation resources" on agner.org. Page 43 in "Optimizing software in C++" is a good place to start.

    http://www.agner.org/optimize/?e=0,34

    The only way you could avoid this penalty is by ensuring that the same instructions are execution regardless of the value of the opcode. This can often be done by using conditional moves (which add a data dependency so are slower than a predictable branch) or otherwise looking for symmetry in your code paths. Considering what you're trying to do this is probably not going to be possible, and if it was then it would almost certainly add a overhead greater than the 15-25 clock cycles for the mispredict.

    In summary, on a modern architecture there's not much you can do that'll be more efficient than a switch/case, and the cost of mispredicting a branch isn't as much as you might expect.

提交回复
热议问题