I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file
On 32-bit XP your maximum program address space is 2GB. Then you have fragmentation due to DLL's and drivers loading up in to your address space. Finally, you have the problem of your heap fragmenting.
Your best move is just to get it over with and run as a 64-bit process (on a 64-bit system). Suddenly all these problems go away. You can use a better heap to mitigate heap fragmentation effects, and you can try using VirtualAlloc to grab your memory in one big contiguous chunk (and then you get to manage it from there!) to discourage DLL's/drivers from fragmenting it.
Finally, you can split your BSP across processes. Complicated and painful, and frankly just putting it on disk would be easier, but in theory you could get better performance by having a group of processes exchanging information, if you can keep everything resident (and assuming you can be smarter than memory than the OS can handle file buffering... which is a big if). Each process would need far less memory and therefore shouldn't run in to the 2GB address space limit. Of course, you'll burn through RAM/swap a lot faster.
You can mitigate the effects of fragmentation of the address space by allocating smaller chunks. This will have other nasty side effects, but you could follow a backoff policy where you grab smaller and smaller chunks of memory if you fail to successfully allocate. Frequently this simple approach will get you a program that works when it otherwise wouldn't, but the rest of the time performs as well as it could.
Boy, doesn't 64-bit computing just sound so much nicer than the other choices?
You may not be allocating and deallocating memory in an optimum manner. As others have pointed out, you may be leaking memory and not knowing it. Debugging and optimizing memory allocation will take time.
If you don't want to spend time optimizing memory usage, why not try the Conservative Garbage Collector? It's a plug-in replacement for malloc()/new and free(). In fact, free() is a no-op, so you can just remove those calls from your program. If, instead, you hand-optimize your program and manage a pool of memory as previously suggested, you'll end up doing a lot of the work that the CGC already does for you.
Have you checked to ensure you aren't leaking memory anywhere?
Since your program is portable to Linux, I suggest running it under Valgrind to make sure.