We can really only guess. Since you have enough free memory (and non-contiguous virtual address space), the problem is most likely related to not being able to allocate enough contiguous memory. The things that need the most contiguous memory are almost exlusively arrays, like the backing array for your queue. When everything works correctly, address space is compacted regularly (part of the GC) and you maximize available contiguous memory. If this doesn't work, something is preventing compaction from working correctly - for example, pinned handles, like the ones used for I/O.
Why does an explicit GC.Collect()
kind of help? It might very well be that you're in a point where all those pinned handles are released, and the compaction actually works. Try using something like VMMap or CLRProfiler to see how the objects are laid out in the address space - the typical case of compaction issues is when you have something like 99% free space in your memory, but nowhere enough to allocate a new object (strings and arrays don't work well with memory fragmentation). Another typical case is when you neglect to use GC.AddMemoryPressure
when allocating unmanaged memory (e.g. for the buffers), so the GC has no idea that it should really start collecting already. Again, CLRProfiler is very helpful in watching when GC happens, and how it maps to memory usage.
If memory fragmentation is indeed the problem, you need to figure out why. This is actually somewhat complex, and may require some work with something like WinDbg, which is rather hard to use, to say the least. I/O always means some pinned buffers, so if you're doing a lot of I/O in parallel, you're interfering with the proper functioning of the GC. The GC tries to deal with this by creating multiple heaps (depending on the exact configuration of GC you're running, but looking at your case, server GC should really be what you're using - you are running this on Windows Server, right?), and I've seen hundreds of heaps being created to "fix" the fragmentation issue - but ultimately, this is destined to fail.
If you have to use pinned handles, you really want to allocate them once, and reuse them if possible. Pinning is preventing the GC from doing its job, so you should only pin stuff that doesn't need to be moved in memory (large object heap objects, pre-allocated buffers on the bottom of the heap...), or at least pin for as short a time as possible.
In general, reusing buffers is a good idea. Sadly, that means that you want to avoid string
s and similar constructs in code like this - string
s being immutable means that every single line you read needs to be a separately allocated object. Fortunately, you don't necessarily need to deal with string
s in your case - a simple byte[]
buffer will work just as well - just look for 0x13, 0x10
instead of "\r\n"
. The main problem you have is that you need to hold a lot of data in memory at once - you either need to minimize that, or make sure the buffers are allocated where they're used the best; for file data, a LOH buffer will help quite a bit.
One way to avoid so many allocations would be to parse the file looking for end-of-lines and remembering just the offset of the line where you want to start copying. As you go, line-by-line (using the reusable byte[]
buffer), you'll just update the offset of the "at most 100 000th line from the end" rather than allocating and freeing strings. Of course, this does mean you have to read some of the data twice - that's just the price of dealing with data that isn't fixed-length and/or indexed :)
Another approach is to read the file from the end. How well this works is hard to predict, since it depends a lot on how the OS and filesystem are capable of handling backwards reading. In some cases, it's just as good as forward reading - both are sequential reads, it's just about whether the OS/FS is smart enough to figure that out or not. In some cases, it's going to be very expensive - if that's the case, use large file buffers (e.g. 16 MiB instead of the more customary 4 kiB etc.) to squeeze as sequential reads as possible. Counting from the back still doesn't quite allow you to stream the data directly to another file (you'll either need to combine this with the first approach or keep the whole 100 000 lines in memory at once yet again), but it means you only ever read the data you're going to use (the most you over-read is the size of your buffer).
Finally, if all else fails, you could use unmanaged memory for some of the work you're doing. I hope I don't have to say this is much trickier than using managed memory - you'll have to be very careful about proper addressing and bounds-checking, among other things. For a task like yours, it's still quite manageable - ultimately, you're just moving lots of bytes with very little "work". You better understand the unmanaged world well, though - otherwise it's just going to lead to bugs that are very hard to track and fix.
EDIT:
Since you made it clear that the "last 100k items" is a workaround and not the desired solution, the easiest thing is to simply stream the data rather than reading everything to RAM and writing everything in one go. If File.Copy
/File.Move
aren't good enough for you, you can use something like this:
var buffer = new byte[4096];
using (var sourceFile = File.OpenRead(...))
using (var targetFile = File.Create(...))
{
var bytesRead = sourceFile.Read(buffer, 0, buffer.Length);
if (bytesRead == 0) break;
targetFile.Write(buffer, 0, bytesRead);
}
The only memory you need is for the (relatively small) buffers.