Memory Allocation/Deallocation Bottleneck?

后端未结

关注

 12  1941

How much of a bottleneck is memory allocation/deallocation in typical real-world programs? Answers from any type of program where performance typically matters are welcome.

相关标签:

12条回答

故里飘歌

2020-11-30 21:07

A Java VM will claim and release memory from the operating system pretty much indepdently of what the application code is doing. This allows it to grab and release memory in large chunks, which is hugely more efficient than doing it in tiny individual operations, as you get with manual memory management.

This article was written in 2005, and JVM-style memory management was already streets ahead. The situation has only improved since then.

Which language boasts faster raw allocation performance, the Java language, or C/C++? The answer may surprise you -- allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions (data provided by Sun; see Resources), whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call (Detlefs, et. al.; see Resources). And allocation performance is not a trivial component of overall performance -- benchmarks show that many real-world C and C++ programs, such as Perl and Ghostscript, spend 20 to 30 percent of their total execution time in malloc and free -- far more than the allocation and garbage collection overhead of a healthy Java application.

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-11-30 21:10

Others have covered C/C++ so I'll just add a little information on .NET.

In .NET heap allocation is generally really fast, as it it just a matter of just grabbing the memory in the generation zero part of the heap. Obviously this cannot go on forever, which is where garbage collection comes in. Garbage collection may affect the performance of your application significantly since user threads must be suspended during compaction of memory. The fewer full collects, the better.

There are various things you can do to affect the workload of the garbage collector in .NET. Generally if you have a lot of memory reference the garbage collector will have to do more work. E.g. by implementing a graph using an adjacency matrix instead of references between nodes the garbage collector will have to analyze fewer references.

Whether that is actually significant in your application or not depends on several factors and you should profile the application with actual data before turning to such optimizations.

0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2020-11-30 21:11

First off, since you said malloc, I assume you're talking about C or C++.

Memory allocation and deallocation tend to be a significant bottleneck for real-world programs. A lot goes on "under the hood" when you allocate or deallocate memory, and all of it is system-specific; memory may actually be moved or defragmented, pages may be reorganized--there's no platform-independent way way to know what the impact will be. Some systems (like a lot of game consoles) also don't do memory defragmentation, so on those systems, you'll start to get out-of-memory errors as memory becomes fragmented.

A typical workaround is to allocate as much memory up front as possible, and hang on to it until your program exits. You can either use that memory to store big monolithic sets of data, or use a memory pool implementation to dole it out in chunks. Many C/C++ standard library implementations do a certain amount of memory pooling themselves for just this reason.

No two ways about it, though--if you have a time-sensitive C/C++ program, doing a lot of memory allocation/deallocation will kill performance.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-11-30 21:20
Pretty much all of you are off base if you are talking about the Microsoft heap. Syncronization is effortlessly handled as is fragmentation.

The current perferrred heap is the LFH, (LOW FRAGMENTATION HEAP), it is default in vista+ OS's and can be configured on XP, via gflag, with out much trouble

It is easy to avoid any locking/blocking/contention/bus-bandwitth issues and the lot with the
```
HEAP_NO_SERIALIZE
```
option during HeapAlloc or HeapCreate. This will allow you to create/use a heap without entering into an interlocked wait.

I would reccomend creating several heaps, with HeapCreate, and defining a macro, perhaps, mallocx(enum my_heaps_set, size_t);

would be fine, of course, you need realloc, free also to be setup as appropiate. If you want to get fancy, make free/realloc auto-detect which heap handle on it's own by evaluating the address of the pointer, or even adding some logic to allow malloc to identify which heap to use based on it's thread id, and building a heierarchy of per-thread heaps and shared global heap's/pools.

The Heap* api's are called internally by malloc/new.

Here's a nice article on some dynamic memory management issues, with some even nicer references. To instrument and analyze heap activity.
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-11-30 21:21

It's significant, especially as fragmentation grows and the allocator has to hunt harder across larger heaps for the contiguous regions you request. Most performance-sensitive applications typically write their own fixed-size block allocators (eg, they ask the OS for memory 16MB at a time and then parcel it out in fixed blocks of 4kb, 16kb, etc) to avoid this issue.

In games I've seen calls to malloc()/free() consume as much as 15% of the CPU (in poorly written products), or with carefully written and optimized block allocators, as little as 5%. Given that a game has to have a consistent throughput of sixty hertz, having it stall for 500ms while a garbage collector runs occasionally isn't practical.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-11-30 21:21

In Java (and potentially other languages with a decent GC implementation) allocating an object is very cheap. In the SUN JVM it only needs 10 CPU Cycles. A malloc in C/c++ is much more expensive, just because it has to do more work.

Still even allocation objects in Java is very cheap, doing so for a lot of users of a web application in parallel can still lead to performance problems, because more Garbage Collector runs will be triggered. Therefore there are those indirect costs of an allocation in Java caused by the deallocation done by the GC. These costs are difficult to quantify because they depend very much on your setup (how much memory do you have) and your application.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页