Something that we found that forced us to duplicate code was our pixel manipulation code. We work with VERY large images and the function call overhead was eating up on the order of 30% of our per-pixel time.
Duplicating the pixel manipulation code gave us 20% faster image traversal at the cost of code complexity.
This is obviously a very rare case, and in the end it bloated our source significantly (a 300 line function is now 1200 lines).