The following testing code does correctly in VS either with debug or release, and also in GCC. It also does correctly for ICC with debug, but not when optimization enabled (
Your best bet is to take the resulting binary step into it and dis-assemble the main function and see what assembly was generated. Not saying you will be able to see a bug, but you can see if something was optimized out.