A while ago, I stumbled upon this 2001 DDJ article by Alexandrescu: http://www.ddj.com/cpp/184403799
It\'s about comparing various ways to initialize
It depends what you're doing. If you have a very specific case, you can often vastly outperform the system libc (and/or compiler inlining) of memset and memcpy.
For example, for the program I work on, I wrote a 16-byte-aligned memcpy and memset designed for small data sizes. The memcpy was made for multiple-of-16 sizes greater than or equal to 64 only (with data aligned to 16), and memset was made for multiple-of-128 sizes only. These restrictions allowed me to get enormous speed, and since I controlled the application, I could tailor the functions specifically to what was needed, and also tailor the application to align all necessary data.
The memcpy performed at about 8-9x the speed of the Windows native memcpy, knocing a 460-byte copy down to a mere 50 clock cycles. The memset was about 2.5x faster, filling a stack array of zeros extremely quickly.
If you're interested in these functions, they can be found here; drop down to around line 600 for the memcpy and memset. They're rather trivial. Note they're designed for small buffers that are supposed to be in cache; if you want to initialize enormous amounts of data in memory while bypassing cache, your issue may be more complex.