How to delete parts from a binary file in C++

前端 未结 3 1842
一生所求
一生所求 2021-01-24 01:02

I would like to delete parts from a binary file, using C++. The binary file is about about 5-10 MB.

What I would like to do:

  1. Search for a ANSI string \"so
相关标签:
3条回答
  • 2021-01-24 01:19

    First, if I understand your meaning in your "How can I search efficiently" subsection, you cannot just skip a few megabytes of data in the search if the target string might be in those first few megabytes.

    As for loading the file into memory, if you do that, don't forget to make sure you have enough space in memory for the entire file. You will be frustrated if you go to use your utility and find that the 2GB file you want to use it on can't fit in the 1.5GB of memory you have left.

    I am going to assume you will load into memory or memory map it for the following.

    You did specifically say this was a binary file, so this means that you cannot use the normal C++ string searching/matching, as the null characters in the file's data will confuse it (end it prematurely without a match). You might instead be able to use memchr to find the first occurrence of the first byte in your target, and memcmp to compare the next few bytes with the bytes in the target; keep using memchr/memcmp pairs to scan through the entire thing until found. This is not the most efficient way, as there are better pattern-matching algorithms, but this is a sort of efficient way, I suppose.

    To "delete" n bytes you have to actually move the data after those n bytes, copying the entire thing up to the new location.

    If you actually copy the data from disk to memory, then it'd be faster to manipulate it there and write to the new file. Otherwise, once you find the spot on the disk you want to start deleting from, you can open a new file for writing, read in X bytes from the first file, where X is the file pointer position into the first file, and write them right into the second file, then seek into the first file to X+n and do the same from there to file1's eof, appending that to what you've already put into file2.

    0 讨论(0)
  • 2021-01-24 01:20

    There are a number of fast string search routines that perform much better than testing each and every character. For example, when trying to find "something", only every 9th character needs to be tested.

    Here's an example I wrote for an earlier question: code review: finding </body> tag reverse search on a non-null terminated char str

    0 讨论(0)
  • 2021-01-24 01:26

    For a 5-10MB file I would have a look at writev() if your system supports it. Read the entire file into memory since it is small enough. Scan for the bytes you want to drop. Pass writev() the list of iovecs (which will just be pointers into your read buffer and lenghts) and then you can rewrite the entire modified contents in a single system call.

    0 讨论(0)
提交回复
热议问题