How to avoid running out of memory in high memory usage application? C / C++

前端 未结 15 2206
天命终不由人
天命终不由人 2021-02-19 14:04

I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file

相关标签:
15条回答
  • 2021-02-19 14:24

    You need to stream your output as well as your input. If your output format is not stream-oriented, consider doing second pass. For example, if the output file starts with check sum/size of the data, leave space on the first pass and seek/write to that space later.

    0 讨论(0)
  • 2021-02-19 14:26

    there's a good technique for that, is to store some instances into files, and after getting them when you need to use them.

    this technique is used by many open source software like Doxygen to be scalable when a big quantity of memory is needed.

    0 讨论(0)
  • 2021-02-19 14:26

    This is an old question but, since I've recently done the same thing ....

    There is no simple answer. In an ideal world you'd use a machine with huge address space (ie 64 bit), and massive amounts of physical memory. Huge address space alone is not sufficient or it'll just thrash. In that case parse the XML file into a database, and with appropriate queries, pull out what you need. Quite likely this is what OSM itself does (I believe the world is about 330GB).

    In reality I'm still using XP 32bit for reasons of expediency.

    It's a trade off between space and speed. You can do pretty much anything in any amount of memory providing you don't care how long it takes. Using STL structures you can parse anything you want, but you'll soon run out of memory. You can define your own allocators that swap, but again, it'll be inefficient because the maps, vectors, sets etc do not really know what you are doing.

    The only way I found to make it all work in a small footprint on a 32 bit machine was to think very carefully about what I was doing and what was needed when and break the task into chunks. Memory efficient (never uses more than ~100MB) but not massively quick, but then it doesn't matter - how often does one have to parse the XML data?

    0 讨论(0)
  • 2021-02-19 14:29

    Assuming you are using Windows XP, if you are only just over your memory limit and do not desire or have the time to rework the code as suggested above, you can add the /3GB switch to your boot.ini file and then it just a matter of setting a linker switch to get an extra 1GB of memory.

    0 讨论(0)
  • 2021-02-19 14:30

    You don't need to switch to 64-bit machines, nor you need most of the 1000 things suggested by others. What you need is a more thoughtful algorithm.

    Here are some things you can do to help out with this situation:

    • If you're on Windows, utilize File Maps (sample code). This will give access to the file via a single buffer pointer as though you read the whole file in memory, only without actually doing that. Recent versions of Linux Kernel have a similar mechanism.
    • If you can, and it looks like you could, scan the file sequentially and avoid creating an in-memory DOM. This will greatly decrease your load-time as well as memory requirements.
    • Use Pooled Memory! You will probably have many tiny objects, such as nodes, points and whatnot. Use a pooled memory to help out (I'm assuming you're using an unmanaged language. Search for Pooled allocation and memory pools).
    • If you're using a managed language, at least move this particular part into an unmanaged language and take control of the memory and file reading. Managed languages have a non-trivial overhead both in memory footprint and performance. (Yes, I know this is tagged "C++"...)
    • Attempt to design an in-place algorithm, where you read and process only the minimum amount of data at a time, so your memory requirements would go down.

    Finally, let me point out that complex tasks require complex measures. If you think you can afford a 64-bit machine with 8GB of RAM, then just use "read file into memory, process data, write output" algorithm, even if it takes a day to finish.

    0 讨论(0)
  • 2021-02-19 14:30

    First, on a 32-bit system, you will always be limited to 4 GB of memory, no matter pagefile settings. (And of those, only 2GB will be available to your process on Windows. On Linux, you'll typically have around 3GB available)

    So the first obvious solution is to switch to a 64-bit OS, and compile your application for 64-bit. That gives you a huge virtual memory space to use, and the OS will swap data in and out of the pagefile as necessary to keep things working.

    Second, allocating smaller chunks of memory at a time may help. It's often easier to find 4 256MB chunks of free memory than one 1GB chunk.

    Third, split up the problem. Don't process the entire dataset at once, but try to load and process only a small section at a time.

    0 讨论(0)
提交回复
热议问题