State of “memset” functionality in C++ with modern compilers

前端 未结 12 1784
我寻月下人不归
我寻月下人不归 2021-02-12 16:03

Context:

A while ago, I stumbled upon this 2001 DDJ article by Alexandrescu: http://www.ddj.com/cpp/184403799

It\'s about comparing various ways to initialize

12条回答
  •  旧时难觅i
    2021-02-12 16:26

    It depends what you're doing. If you have a very specific case, you can often vastly outperform the system libc (and/or compiler inlining) of memset and memcpy.

    For example, for the program I work on, I wrote a 16-byte-aligned memcpy and memset designed for small data sizes. The memcpy was made for multiple-of-16 sizes greater than or equal to 64 only (with data aligned to 16), and memset was made for multiple-of-128 sizes only. These restrictions allowed me to get enormous speed, and since I controlled the application, I could tailor the functions specifically to what was needed, and also tailor the application to align all necessary data.

    The memcpy performed at about 8-9x the speed of the Windows native memcpy, knocing a 460-byte copy down to a mere 50 clock cycles. The memset was about 2.5x faster, filling a stack array of zeros extremely quickly.

    If you're interested in these functions, they can be found here; drop down to around line 600 for the memcpy and memset. They're rather trivial. Note they're designed for small buffers that are supposed to be in cache; if you want to initialize enormous amounts of data in memory while bypassing cache, your issue may be more complex.

提交回复
热议问题