micro-optimization | 易学教程

Long latency instruction

阅读更多关于 Long latency instruction

问题 I would like a long-latency single-uop x86 1 instruction, in order to create long dependency chains as part of testing microarchitectural features. Currently I'm using fsqrt , but I'm wondering is there is something better. Ideally, the instruction will score well on the following criteria: Long latency Stable/fixed latency One or a few uops (especially: not microcoded) Consumes as few uarch resources as possible (load/store buffers, page walkers, etc) Able to chain (latency-wise) with itself

Threshold an absolute value

阅读更多关于 Threshold an absolute value

问题 I have the following function: char f1( int a, unsigned b ) { return abs(a) <= b; } For execution speed, I want to rewrite it as follows: char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast Or alternatively with this signature that could have subtle implications even for non-negative b : char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; } Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b

C coding practices for performance or code size - beyond what a compiler does

阅读更多关于 C coding practices for performance or code size - beyond what a compiler does

问题 I'm looking to see what can a programmer do in C, that can determine the performance and/or the size of the generated object file. For e.g, 1. Declaring simple get/set functions as inline may increase performance (at the cost of a larger footprint) 2. For loops that do not use the value of the loop variable itself, count down to zero instead of counting up to a certain value etc. It looks like compilers now have advanced to a level where "simple" tricks (like the two points above) are not

C coding practices for performance or code size - beyond what a compiler does

阅读更多关于 C coding practices for performance or code size - beyond what a compiler does

Branch on ?: operator?

阅读更多关于 Branch on ?: operator?

问题 For a typical modern compiler on modern hardware, will the ? : operator result in a branch that affects the instruction pipeline? In other words which is faster, calling both cases to avoid a possible branch: bool testVar = someValue(); // Used later. purge(white); purge(black); or picking the one actually needed to be purged and only doing it with an operator ?: : bool testVar = someValue(); purge(testVar ? white : black); I realize you have no idea how long purge() will take, but I'm just

Cost of exception handlers in Python

阅读更多关于 Cost of exception handlers in Python

问题 In another question, the accepted answer suggested replacing a (very cheap) if statement in Python code with a try/except block to improve performance. Coding style issues aside, and assuming that the exception is never triggered, how much difference does it make (performance-wise) to have an exception handler, versus not having one, versus having a compare-to-zero if-statement? 回答1: Why don't you measure it using the timeit module? That way you can see whether it's relevant to your

Cost of exception handlers in Python

阅读更多关于 Cost of exception handlers in Python

Cost of exception handlers in Python

阅读更多关于 Cost of exception handlers in Python

Rotating (by 90°) a bit matrix (up to 8x8 bits) within a 64-bit integer

阅读更多关于 Rotating (by 90°) a bit matrix (up to 8x8 bits) within a 64-bit integer

问题 I have a bit matrix (of size 6x6, or 7x7, or 8x8) stored within one single 64-bit integer. I am looking for c++ code that rotates these matrices by 90, 180, 270 degrees, as well as c++ code for shifting (horizontally and vertically) and mirroring these matrices. The output must be again a 64-bit integer. Using some of the advanced CPU instruction sets would probably be okay, as well as using hash tables or similar techniques - speed is of highest importance, and RAM is available. I will run

PHP micro-optimization

阅读更多关于 PHP micro-optimization

问题 How can I spot useless micro-optimization techniques? What should be avoided? 回答1: Any optimization done without being measured and profiled first is useless. PHP code profilers: xDebug PHP_Debug time (Sometimes it is easy to spot bottlenecks in the code using a simple echo time() ) Always measure before optimizing! 回答2: Write code that works and is readable. If you find it sluggish, you can always do some profiling. 回答3: I'm making myself unpopular and say isset . To check for undefined