A while ago, I stumbled upon this 2001 DDJ article by Alexandrescu: http://www.ddj.com/cpp/184403799
It\'s about comparing various ways to initialize
d) Accept that trying to play "jedi mind tricks" with the initialization will lead to more lost programmer hours than the cumulative milliseconds difference between some obscure but fast method versus something obvious and clear.
You can take a look on liboil, they (try to) provide different implementation of the same function and choosing the fastest on initialization. Liboil has a pretty liberal licence, so you can take it also for proprietary software.
http://liboil.freedesktop.org/
I would always choose an initialization method that is part of the runtime or OS (memset) I am using (worse case pick one that is part of a library that I am using).
Why: If you are implementing your own initialization, you might end up with a marginally better solution now, but it is likely that in a couple of years the runtime has improved. And you don't want to do the same work that the guys maintaining the runtime do.
All this stands if the improvement in runtime is marginal. If you have a difference of an order of magnitude between memset and your own initialization, then it makes sense to have your code running, but I really doubt this case.
The DDJ article acknowledges that memset is the best answer, and much faster than what he was trying to achieve:
There is something sacrosanct about C's memory manipulation functions memset, memcpy, and memcmp. They are likely to be highly optimized by the compiler vendor, to the extent that the compiler might detect calls to these functions and replace them with inline assembler instructions — this is the case with MSVC.
So, if memset works for you (ie. you are initializing with a single byte) then use it.
Whilst every millisecond may count, you should establish what percentage of your execution time is lost to setting memory. It is likely very low (1 or 2%??) given that you have useful work to do as well. Given that the optimization effort would likely have a much better rate of return elsewhere.
It depends what you're doing. If you have a very specific case, you can often vastly outperform the system libc (and/or compiler inlining) of memset and memcpy.
For example, for the program I work on, I wrote a 16-byte-aligned memcpy and memset designed for small data sizes. The memcpy was made for multiple-of-16 sizes greater than or equal to 64 only (with data aligned to 16), and memset was made for multiple-of-128 sizes only. These restrictions allowed me to get enormous speed, and since I controlled the application, I could tailor the functions specifically to what was needed, and also tailor the application to align all necessary data.
The memcpy performed at about 8-9x the speed of the Windows native memcpy, knocing a 460-byte copy down to a mere 50 clock cycles. The memset was about 2.5x faster, filling a stack array of zeros extremely quickly.
If you're interested in these functions, they can be found here; drop down to around line 600 for the memcpy and memset. They're rather trivial. Note they're designed for small buffers that are supposed to be in cache; if you want to initialize enormous amounts of data in memory while bypassing cache, your issue may be more complex.
The year isn't 2001 anymore. Since then, new versions of Visual Studio have appeared. I've taken the time to study the memset in those. They will use SSE for memset (if available, of course). If your old code was correct, statistically if will now be faster. But you might hit an unfortunate cornercase. I expect the same from GCC, although I haven't studied the code. It's a fairly obvious improvement, and an Open-Source compiler. Someone will have created the patch.