I\'m currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function. In this quest, i\'m after any performance benefit.
Creating local variables can be literally free if they are POD types. You likely are overflowing a cache line with too many stack variables or other similar alignment-based causes which are very specific to your piece of code. I usually find that non-local variables significantly decrease performance.