I\'ve found loads of examples on to to replace text in files using regex. However it all boils down to two versions:
1. Iterate over all lines in the file and apply regex to
If you don't mind getting your hands a little dirty (and your regex is simple enough, or perhaps you have a strong desire for speed and don't mind suffering a bit), you can use Ragel. It can target C#, though the site doesn't mention it. You'll need to wrap a FileStream to provide a buffered indexer or use a memory mapped file (with unsafe pointers) in a 64 bit process to use this with large files though.
I would say you should pre-parse/normalize the data before doing your replacements so that each line describes one possible set of data that needs to have replacements applied. Otherwise you get into complications with data integrity that cannot really be solved without a host of other difficulties.
If there is a way to chunk the data into logical blocks then you could build a program that uses a mapreduce pattern to parse the data.
Perhaps you could load in 2 lines at a time (or more, depending on how many lines you think your matches are going to span), and overlap them, e.g: load lines 1-2, then the next loop load lines 2-3, the next load 3-4; and do your multiline regexes over both lines combined, in each loop.
I'm with Bart; you really should be using some kind of parser for this.
Or, if you don't mind spawning a child process, you could just use sed (there's a native port on windows, or you can use Cygwin)
Here's the Answer:
There is no easy way
I found a StreamRegex-Class which could be able to do what I am looking for.
From what I could grasp of the algorithm:
That way it is not nessesary to load the full file -- or at least the chances of loading the full file in memory are reduced...
However: Worst case is that there is no match in the whole file - in this case the full file will be loaded into memory.
Regex is not the way to go, especially not with these large amounts of text. Create a little parser of your own:
That will give you all the starting- and closing-offset numbers of the comment blocks. You should now be able to replace them by creating a temp-file and writing the text from the original file to the temp file (and writing something else if you're inside a comment block of course).
Edit: source files of 2GiB??