State of “memset” functionality in C++ with modern compilers

前端 未结 12 1761
我寻月下人不归
我寻月下人不归 2021-02-12 16:03

Context:

A while ago, I stumbled upon this 2001 DDJ article by Alexandrescu: http://www.ddj.com/cpp/184403799

It\'s about comparing various ways to initialize

相关标签:
12条回答
  • 2021-02-12 16:08

    If you have to allocate your memory as well as initialize it, I would:

    • Use calloc instead of malloc
    • Change as much of my default values to be zero as possible (ex: let my default enumeration value be zero; or if a boolean variable's default value is 'true', store it's inverse value in the structure)

    The reason for this is that calloc zero-initializes memory for you. While this will involve the overhead for zeroing memory, most compilers are likely to have this routine highly-optimized -- more optimized that malloc/new with a call to memcpy.

    0 讨论(0)
  • 2021-02-12 16:09

    If memory is not a problem, then precreate a static buffer of the size you need, initialized to your value(s). As far as I know, both these compilers are optimizing compilers, so if you use a simple for-loop, the compiler should generate the optimum assembler-commands to copy the buffer across.

    If memory is a problem, use a smaller buffer & copy that accross at sizeof(..) offsets into the new buffer.

    HTH

    0 讨论(0)
  • 2021-02-12 16:10

    As always with these types of questions, the problem is constrained by factors outside of your control, namely, the bandwidth of the memory. And if the host OS decides to start paging the memory then things get far worse. On Win32 platforms, the memory is paged and pages are only allocated on first use which will generate a big pause every page boundary whilst the OS finds a page to use (this may require another process' page to be paged to disk).

    This, however, is the absolute fastest memset ever written:

    void memset (void *memory, size_t size, byte value)
    {
    }
    

    Not doing something is always the fastest way. Is there any way the algorithms can be written to avoid the initial memset? What are the algorithms you're using?

    0 讨论(0)
  • 2021-02-12 16:14

    The MASM Forum has a lot of incredible assembly language programmers/hobbyists who have beaten this issue completely to death (have a look through The Laboratory). The results were much like Christopher's response: SSE is incredible for large, aligned, buffers, but going down you will eventually reach such a small size that a basic for loop is just as quick.

    0 讨论(0)
  • 2021-02-12 16:14

    Well this all depends on your problem domain and your specifications, have you ran into performance issues, failed to meet timing deadline and pinpointed memset as being the root of all evil ? If it this you're in the one and only case where you could consider some memset tuning.

    Then you should also keep in mind that the memset anyhow will vary on the hardware the platform it is ran on, during those five years, will the software run on the same platform ? On the same architecture ? One you come to that conclusion you can try to 'roll your own' memset, typically playing with the alignment of buffers, making sure you zero 32 bit values at once depending on what is most performant on your architecture.

    I once ran into the same for memcmpt where the alignment overhead caused some problems, bit typically this will not result in miracles, only a small improvement, if any. If you're missing your requirements by an order of mangnitude than this won't get you any further.

    0 讨论(0)
  • 2021-02-12 16:16

    Memset/memcpy are mostly written with a basic instruction set in mind, and so can be outperformed by specialized SSE routines, which on the other hand enforce certain alignment constraints.

    But to reduce it to a list :

    1. For data-sets <= several hundred kilobytes memcpy/memset perform faster than anything you could mock up.
    2. For data-sets > megabytes use a combination of memcpy/memset to get the alignment and then use your own SSE optimized routines/fallback to optimized routines from Intel etc.
    3. Enforce the alignment at the start up and use your own SSE-routines.

    This list only comes into play for things where you need the performance. Too small/or once initialized data-sets are not worth the hassle.

    Here is an implementation of memcpy from AMD, I can't find the article which described the concept behind the code.

    0 讨论(0)
提交回复
热议问题