This is the best introduction to MMX/SSE programming I ever found. (I've programmed SSE2 for 5 years and I still find this tutorial to be the most conceptually clear.)
http://www.tommesani.com/Docs.html
This is not a complete list of instructions; so once you're ready to learn more, do start reading the Intel intrinsics guide as @PaulR suggests.
One important thing to keep in mind is that MMX/SSE tend to be severely limiting in terms of movements of data (shuffle or arbitrary permutation, or change of single element). This is a limitation of CPU silicon design. Scatter-gather instructions were only added a few years ago, and might not even be available on your customer's computers.
There is a large repertoire of vectorization tricks for MMX/SSE similar to the way http://www.hackersdelight.org/ prescribes tricks for exploiting bit-parallel operations.