micro-optimization

Long latency instruction

我怕爱的太早我们不能终老 提交于 2020-01-14 19:39:12
问题 I would like a long-latency single-uop x86 1 instruction, in order to create long dependency chains as part of testing microarchitectural features. Currently I'm using fsqrt , but I'm wondering is there is something better. Ideally, the instruction will score well on the following criteria: Long latency Stable/fixed latency One or a few uops (especially: not microcoded) Consumes as few uarch resources as possible (load/store buffers, page walkers, etc) Able to chain (latency-wise) with itself

Threshold an absolute value

白昼怎懂夜的黑 提交于 2020-01-14 08:50:29
问题 I have the following function: char f1( int a, unsigned b ) { return abs(a) <= b; } For execution speed, I want to rewrite it as follows: char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast Or alternatively with this signature that could have subtle implications even for non-negative b : char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; } Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b

C coding practices for performance or code size - beyond what a compiler does

我的梦境 提交于 2020-01-13 01:30:02
问题 I'm looking to see what can a programmer do in C, that can determine the performance and/or the size of the generated object file. For e.g, 1. Declaring simple get/set functions as inline may increase performance (at the cost of a larger footprint) 2. For loops that do not use the value of the loop variable itself, count down to zero instead of counting up to a certain value etc. It looks like compilers now have advanced to a level where "simple" tricks (like the two points above) are not

C coding practices for performance or code size - beyond what a compiler does

别说谁变了你拦得住时间么 提交于 2020-01-13 01:29:13
问题 I'm looking to see what can a programmer do in C, that can determine the performance and/or the size of the generated object file. For e.g, 1. Declaring simple get/set functions as inline may increase performance (at the cost of a larger footprint) 2. For loops that do not use the value of the loop variable itself, count down to zero instead of counting up to a certain value etc. It looks like compilers now have advanced to a level where "simple" tricks (like the two points above) are not

Branch on ?: operator?

有些话、适合烂在心里 提交于 2020-01-11 08:28:11
问题 For a typical modern compiler on modern hardware, will the ? : operator result in a branch that affects the instruction pipeline? In other words which is faster, calling both cases to avoid a possible branch: bool testVar = someValue(); // Used later. purge(white); purge(black); or picking the one actually needed to be purged and only doing it with an operator ?: : bool testVar = someValue(); purge(testVar ? white : black); I realize you have no idea how long purge() will take, but I'm just

Cost of exception handlers in Python

时光毁灭记忆、已成空白 提交于 2020-01-08 14:05:27
问题 In another question, the accepted answer suggested replacing a (very cheap) if statement in Python code with a try/except block to improve performance. Coding style issues aside, and assuming that the exception is never triggered, how much difference does it make (performance-wise) to have an exception handler, versus not having one, versus having a compare-to-zero if-statement? 回答1: Why don't you measure it using the timeit module? That way you can see whether it's relevant to your

Cost of exception handlers in Python

北战南征 提交于 2020-01-08 14:04:11
问题 In another question, the accepted answer suggested replacing a (very cheap) if statement in Python code with a try/except block to improve performance. Coding style issues aside, and assuming that the exception is never triggered, how much difference does it make (performance-wise) to have an exception handler, versus not having one, versus having a compare-to-zero if-statement? 回答1: Why don't you measure it using the timeit module? That way you can see whether it's relevant to your

Cost of exception handlers in Python

丶灬走出姿态 提交于 2020-01-08 14:03:04
问题 In another question, the accepted answer suggested replacing a (very cheap) if statement in Python code with a try/except block to improve performance. Coding style issues aside, and assuming that the exception is never triggered, how much difference does it make (performance-wise) to have an exception handler, versus not having one, versus having a compare-to-zero if-statement? 回答1: Why don't you measure it using the timeit module? That way you can see whether it's relevant to your

Rotating (by 90°) a bit matrix (up to 8x8 bits) within a 64-bit integer

帅比萌擦擦* 提交于 2020-01-04 15:15:10
问题 I have a bit matrix (of size 6x6, or 7x7, or 8x8) stored within one single 64-bit integer. I am looking for c++ code that rotates these matrices by 90, 180, 270 degrees, as well as c++ code for shifting (horizontally and vertically) and mirroring these matrices. The output must be again a 64-bit integer. Using some of the advanced CPU instruction sets would probably be okay, as well as using hash tables or similar techniques - speed is of highest importance, and RAM is available. I will run

PHP micro-optimization

有些话、适合烂在心里 提交于 2020-01-02 09:42:27
问题 How can I spot useless micro-optimization techniques? What should be avoided? 回答1: Any optimization done without being measured and profiled first is useless. PHP code profilers: xDebug PHP_Debug time (Sometimes it is easy to spot bottlenecks in the code using a simple echo time() ) Always measure before optimizing! 回答2: Write code that works and is readable. If you find it sluggish, you can always do some profiling. 回答3: I'm making myself unpopular and say isset . To check for undefined