I'm currently working on a project for medical image processing, that needs a huge amount of memory. Is there anything I can do to avoid heap fragmentation and to speed up access of image data that has already been loaded into memory?
The application has been written in C++ and runs on Windows XP.
EDIT: The application does some preprocessing with the image data, like reformatting, calculating look-up-tables, extracting sub images of interest ... The application needs about 2 GB RAM during processing, of which about 1,5 GB may be used for the image data.
If you are doing medical image processing it is likely that you are allocating big blocks at a time (512x512, 2-byte per pixel images). Fragmentation will bite you if you allocate smaller objects between the allocations of image buffers.
Writing a custom allocator is not necessarily hard for this particular use-case. You can use the standard C++ allocator for your Image object, but for the pixel buffer you can use custom allocation that is all managed within your Image object. Here's a quick and dirty outline:
- Use a static array of structs, each struct has:
- A solid chunk of memory that can hold N images -- the chunking will help control fragmentation -- try an initial N of 5 or so
- A parallel array of bools indicating whether the corresponding image is in use
- To allocate, search the array for an empty buffer and set its flag
- If none found, append a new struct to the end of the array
- To deallocate, find the corresponding buffer in the array(s) and clear the boolean flag
This is just one simple idea with lots of room for variation. The main trick is to avoid freeing and reallocating the image pixel buffers.
There are answers, but it's difficult to be general without knowing the details of the problem.
I'm assuming 32-bit Windows XP.
Try to avoid needing 100s of MB of contiguous memory, if you are unlucky, a few random dlls will load themselves at inconventient points through your available address space rapidly cutting down very large areas of contiguous memory. Depending on what APIs you need, this can be quite hard to prevent. It can be quite surprising how just allocating a couple of 400MB blocks of memory in addition to some 'normal' memory usage can leave you with nowhere to allocate a final 'little' 40MB block.
On the other hand, do preallocate reasonable size chunks at a time. Of the order of 10MB or so is a good compromise block size. If you can manage to partition your data into this sort of size chunks, you'll be able to fill the address space reasonably efficiently.
If you're still going to run out of address space, you're going to need to be able to page blocks in and out based on some sort of caching algorithm. Choosing the right blocks to page out is going to depend very much on your processing algortihm and will need careful analysis.
Choosing where to page things out to is another decision. You might decide to just write them to temporary files. You could also investigate Microsoft's Address Windowing Extenstions API. In either case you need to be careful in your application design to clean up any pointers that are pointing to something that is about to be paged out otherwise really bad things(tm) will happen.
Good Luck!
If you are going to be performing operations on a large image matrix, you might want to consider a technique called "tiling". The idea is generally to load the image in memory so that the same contiguous block of bytes would not contain pixels in one line, but rather of a square in 2D space. The rationale behind this is that you would do more operations that are closer to each other in 2D rather than on one scan line.
This is not going to reduce your memory use, but may have a huge impact on page swapping and performance.
Without much more information about the problem (for example language), one thing you can do is to avoid allocation churn by reusing allocations and not allocate, operate and free. Allocator such as dlmalloc handles fragmentation better than Win32 heaps.
What you will be hitting here is virtual address range limit, which with 32b Windows gives you at most 2 GB. You should be also aware that using a graphical API like DirectX or OpenGL will use extensive portions of those 2 GB for frame buffer, textures and similar data.
1.5-2 GB for a 32b application is quite hard to achieve. The most elegant way to do this is to use 64b OS and 64b application. Even with 64b OS and 32b application this may be somewhat viable, as long as you use LARGE_ADDRESS_AWARE
.
However, as you need to store image data, you may also be able to work around this by using File Mapping as a memory store - this can be done in such a way that you have a memory committed and accessible, but not using any virtual addresses at all.
Guessing here that you meant avoid fragmentation and not avoid defragmentation. Also guessing that you are working with a non managed language (c or C++ probably). I would suggest that you allocate large chunks of memory and then serve heap allocations from the allocated memory blocks. This pool of memory because contains large blocks of memory is lessely prone to fragmentation. To sum up you should implement a custom memory allocator.
See some general ideas on this here.
I gues you're using something unmanaged, because in managed platforms the system (garbage collector) takes care of fragmentation.
For C/C++ you can use some other allocator, than the default one. (there were alrady some threads about allocators on stackowerflow).
Also, you can create your own data storage. For example, in the project I'm currently working on, we have a custom storage (pool) for bitmaps (we store them in a large contigous hunk of memory), because we have a lot of them, and we keep track of heap fragmentation and defragment it when the fragmentation is to big.
You might need to implement manual memory management. Is the image data long lived? If not, then you can use the pattern used by apache web server: allocate large amounts of memory and wrap them into memory pools. Pass those pools as the last argument in functions, so they can use the pool to satisfy the need to allocate temporary memory. Once the call chain is finished, all the memory in the pool can should be no longer used, so you can scrub the memory area and used it again. Allocations are fast, since they only mean adding a value to a pointer. Deallocation is really fast, since you will free very large blocks of memory at once.
If your application is multithreaded, you might need to store the pool in thread local storage, to avoid cross-thread communication overhead.
If you can isolate exactly those places where you're likely to allocate large blocks, you can (on Windows) directly call VirtualAlloc instead of going through the memory manager. This will avoid fragmentation within the normal memory manager.
This is an easy solution and it doesn't require you to use a custom memory manager.
来源:https://stackoverflow.com/questions/150753/how-to-avoid-heap-fragmentation