Memory Allocation/Deallocation Bottleneck?

后端 未结 12 1942
轮回少年
轮回少年 2020-11-30 20:44

How much of a bottleneck is memory allocation/deallocation in typical real-world programs? Answers from any type of program where performance typically matters are welcome.

相关标签:
12条回答
  • 2020-11-30 21:22

    According to MicroQuill SmartHeap Technical Specification, "a typical application [...] spends 40% of its total execution time on managing memory". You can take this figure as an upper bound, i personally feel that a typical application spends more like 10-15% of execution time allocating/deallocating memory. It rarely is a bottleneck in single-threaded application.

    In multithreaded C/C++ applications standard allocators become an issue due to lock contention. This is where you start to look for more scalable solutions. But keep in mind Amdahl's Law.

    0 讨论(0)
  • 2020-11-30 21:23

    I know I answered earlier, however, that was ananswer to the other answer's, not to your question.

    To speak to you directly, if I understand correctly, your performance use case criteria is throughput.

    This to me, means's that you should be looking almost exclusivly at NUMA aware allocators.

    None of the earlier references; IBM JVM paper, Microquill C, SUN JVM. Cover this point so I am highly suspect of their application today, where, at least on the AMD ABI, NUMA is the pre-eminent memory-cpu governer.

    Hands down; real world, fake world, whatever world... NUMA aware memory request/use technologies are faster. Unfortunately, I'm running Windows currently, and I have not found the "numastat" which is available in linux.

    A friend of mine has written about this in depth in his implmentation for the FreeBSD kernel.

    Dispite me being able to show at-hoc, the typically VERY large amount of local node memory requests on top of the remote node (underscoring the obvious performance throughput advantage), you can surly benchmark yourself, and that would likely be what you need todo as your performance charicterisitc is going to be highly specific.

    I do know that in a lot of ways, at least earlier 5.x VMWARE faired rather poorly, at that time at least, for not taking advantage of NUMA, frequently demanding pages from the remote node. However, VM's are a very unique beast when it comes to memory compartmentailization or containerization.

    One of the references I cited is to Microsoft's API implmentation for the AMD ABI, which has NUMA allocation specialized interfaces for user land application developers to exploit ;)

    Here's a fairly recent analysis, visual and all, from some browser add-on developers who compare 4 different heap implmentations. Naturally the one they developed turns out on top (odd how the people who do the testing often exhibit the highest score's).

    They do cover in some ways quantifiably, at least for their use case, what the exact trade off is between space/time, generally they had identified the LFH (oh ya and by the way LFH is simply a mode apparently of the standard heap) or similarly designed approach essentially consumes signifcantly more memory off the bat however over time, may wind up using less memory... the grafix are neat too...

    I would think however that selecting a HEAP implmentation based on your typical workload after you well understand it ;) is a good idea, but to well understand your needs, first make sure your basic operations are correct before you optimize these odds and ends ;)

    0 讨论(0)
  • 2020-11-30 21:24

    Nearly every high performance application now has to use threads to exploit parallel computation. This is where the real memory allocation speed killer comes in when writing C/C++ applications.

    In a C or C++ application, malloc/new must take a lock on the global heap for every operation. Even without contention locks are far from free and should be avoided as much as possible.

    Java and C# are better at this because threading was designed in from the start and the memory allocators work from per-thread pools. This can be done in C/C++ as well, but it isn't automatic.

    0 讨论(0)
  • 2020-11-30 21:27

    Allocating and releasing memory in terms of performance are relatively costly operations. The calls in modern operating systems have to go all the way down to the kernel so that the operating system is able to deal with virtual memory, paging/mapping, execution protection etc.

    On the other side, almost all modern programming languages hide these operations behind "allocators" which work with pre-allocated buffers.

    This concept is also used by most applications which have a focus on throughput.

    0 讨论(0)
  • 2020-11-30 21:29

    In general the cost of memory allocation is probably dwarfed by lock contention, algorithmic complexity, or other performance issues in most applications. In general, I'd say this is probably not in the top-10 of performance issues I'd worry about.

    Now, grabbing very large chunks of memory might be an issue. And grabbing but not properly getting rid of memory is something I'd worry about.

    In Java and JVM-based languages, new'ing objects is now very, very, very fast.

    Here's one decent article by a guy who knows his stuff with some references at the bottom to more related links: http://www.ibm.com/developerworks/java/library/j-jtp09275.html

    0 讨论(0)
  • 2020-11-30 21:32

    This is where c/c++'s memory allocation system works the best. The default allocation strategy is OK for most cases but it can be changed to suit whatever is needed. In GC systems there's not a lot you can do to change allocation strategies. Of course, there is a price to pay, and that's the need to track allocations and free them correctly. C++ takes this further and the allocation strategy can be specified per class using the new operator:

    class AClass
    {
    public:
      void *operator new (size_t size); // this will be called whenever there's a new AClass
       void *operator new [] (size_t size); // this will be called whenever there's a new AClass []
      void operator delete (void *memory); // if you define new, you really need to define delete as well
      void operator delete [] (void *memory);define delete as well
    };
    

    Many of the STL templates allow you to define custom allocators as well.

    As with all things to do with optimisation, you must first determine, through run time analysis, if memory allocation really is the bottleneck before writing your own allocators.

    0 讨论(0)
提交回复
热议问题