问题
I was trying to figure out a way to modify a text file (specially deleting specific lines) without reading a big part of file into memory or rewriting the whole file. Here am talking about files larger than main memory about 15-50 Gigs.
P.S. I am using Linux.
回答1:
You aren't going to get around making a new file, so just bite the bullet and do it. Use grep
with appropriate options and pipe the result to a second file:
$ grep -fv patternsToExcludeFromInput input > output
Another approach is to put patterns into, as examples, a hash table (Perl), a dictionary (Python), or an unordered_map
(C++), and process each line of your input file to look for matches.
If there is no match, print the line to the standard output stream (which you can pipe to a regular file). Your memory usage will be limited mostly to the hash table and the line of input you are querying.
回答2:
If the file is way larger than memory, sed
is your friend. It acts as a filter between your old file and a new file, and at the end, you just have to rename the new file to the old name. The syntax is a bit strange for newcomers, but it is really powerful, being able to select lines by number, by regexes, or by range, and apply insertions, deletions or string substutions.
回答3:
You can open the file in "rw" mode and use fseek, fread, fwrite to read/write portions of it. You must pay attention of not overwriting the part you have not read yet. So to delete a line you read and write forward, to insert a line you read and write backward (starting from the end of file).
example
To remove the first 100 bytes from the beginning of your file you could do something like:
FILE *fp = fopen(filename,"rw");
size_t BLOCK_SIZE = 1024;
char buffer[BLOCK_SIZE];
size_t offset = 100;
fseek(fp,0,SEEK_END);
size_t length = ftell(fp);
for (size_t i=0; i< (length-offset+BLOCK_SIZE-1) / BLOCK_SIZE; ++i) {
fseek(fp,i*BLOCK_SIZE + offset,SEEK_SET);
size_t count = fread(fp,buffer,sizeof(char),BLOCK_SIZE);
fseek(fp,i*BLOCK_SIZE,SEEK_SET);
fwrite(fp,buffer,sizeof(char),count);
}
来源:https://stackoverflow.com/questions/24615237/modifying-a-text-file-without-reading-into-memory