I\'m currently working on a project for medical image processing, that needs a huge amount of memory. Is there anything I can do to avoid heap fragmentation and to speed up
What you will be hitting here is virtual address range limit, which with 32b Windows gives you at most 2 GB. You should be also aware that using a graphical API like DirectX or OpenGL will use extensive portions of those 2 GB for frame buffer, textures and similar data.
1.5-2 GB for a 32b application is quite hard to achieve. The most elegant way to do this is to use 64b OS and 64b application. Even with 64b OS and 32b application this may be somewhat viable, as long as you use LARGE_ADDRESS_AWARE
.
However, as you need to store image data, you may also be able to work around this by using File Mapping as a memory store - this can be done in such a way that you have a memory committed and accessible, but not using any virtual addresses at all.
If you can isolate exactly those places where you're likely to allocate large blocks, you can (on Windows) directly call VirtualAlloc instead of going through the memory manager. This will avoid fragmentation within the normal memory manager.
This is an easy solution and it doesn't require you to use a custom memory manager.
There are answers, but it's difficult to be general without knowing the details of the problem.
I'm assuming 32-bit Windows XP.
Try to avoid needing 100s of MB of contiguous memory, if you are unlucky, a few random dlls will load themselves at inconventient points through your available address space rapidly cutting down very large areas of contiguous memory. Depending on what APIs you need, this can be quite hard to prevent. It can be quite surprising how just allocating a couple of 400MB blocks of memory in addition to some 'normal' memory usage can leave you with nowhere to allocate a final 'little' 40MB block.
On the other hand, do preallocate reasonable size chunks at a time. Of the order of 10MB or so is a good compromise block size. If you can manage to partition your data into this sort of size chunks, you'll be able to fill the address space reasonably efficiently.
If you're still going to run out of address space, you're going to need to be able to page blocks in and out based on some sort of caching algorithm. Choosing the right blocks to page out is going to depend very much on your processing algortihm and will need careful analysis.
Choosing where to page things out to is another decision. You might decide to just write them to temporary files. You could also investigate Microsoft's Address Windowing Extenstions API. In either case you need to be careful in your application design to clean up any pointers that are pointing to something that is about to be paged out otherwise really bad things(tm) will happen.
Good Luck!
Guessing here that you meant avoid fragmentation and not avoid defragmentation. Also guessing that you are working with a non managed language (c or C++ probably). I would suggest that you allocate large chunks of memory and then serve heap allocations from the allocated memory blocks. This pool of memory because contains large blocks of memory is lessely prone to fragmentation. To sum up you should implement a custom memory allocator.
See some general ideas on this here.
If you are going to be performing operations on a large image matrix, you might want to consider a technique called "tiling". The idea is generally to load the image in memory so that the same contiguous block of bytes would not contain pixels in one line, but rather of a square in 2D space. The rationale behind this is that you would do more operations that are closer to each other in 2D rather than on one scan line.
This is not going to reduce your memory use, but may have a huge impact on page swapping and performance.
If you are doing medical image processing it is likely that you are allocating big blocks at a time (512x512, 2-byte per pixel images). Fragmentation will bite you if you allocate smaller objects between the allocations of image buffers.
Writing a custom allocator is not necessarily hard for this particular use-case. You can use the standard C++ allocator for your Image object, but for the pixel buffer you can use custom allocation that is all managed within your Image object. Here's a quick and dirty outline:
This is just one simple idea with lots of room for variation. The main trick is to avoid freeing and reallocating the image pixel buffers.