I\'m writing a performance-critical, number-crunching C++ project where 70% of the time is used by the 200 line core module.
I\'d like to optimize the core using inl
The microsoft compiler is very poor at optimisations when inline assembly gets involved. It has to back up registers because if you use eax then it won't move eax to another free register it will continue using eax. The GCC assembler is far more advanced on this front.
To get round this microsoft started offering intrinsics. These are a far better way to do your optimisation as it allows the compiler to work with you. As Chris mentioned inline assembly doesn't work under x64 with the MS compiler as well so on that platform you REALLY are better off just using the intrinsics.
They are easy to use and give good performance. I will admit I am often able to squeeze a few more cycles out of it by using an external assembler but they're bloody good for the productivity improvement they provide