As we know from from C11-memory_order: http://en.cppreference.com/w/c/atomic/memory_order
And the same from C++11-std::memory_order: http://en.cppreference.com/w/cpp
It is true that normal1 SSE load and store instructions, as well the implied load when using a memory source operand, have the same acquire and release behavior in terms of ordering as normal loads and stores of GP registers.
They are not, however, generally useful directly to implement std::memory_order_acquire
or std::memory_order_release
operations on std::atomic
objects larger than 8 bytes because there is no guarantee of atomicity for SSE or AVX loads and stores of larger than 8 bytes. The missing guarantee isn't just theoretical: there are several implementations (including brand new ones like AMD's Ryzen) that split large loads or stores up into two smaller ones.
1 I.e., those not listed in the exception list in the accepted answer: NT stores, clflush
and string operations.