Should GC.Collect() be called regularly? [closed]

前端未结

关注

 5  1684

你的背包

相关标签:

5条回答

失恋的感觉

2021-01-06 01:23

Firstly check very carefully you are closing (disposing) all files object, as otherwise internal file buffers will not be released until the GC discovers you have forgotten to close the file.

If you don’t need to copy the file, just rename it (FileInfo.Rename). This is the normal way of coping with log files.

If you don’t need to process the data, use the FileInfo.CopyTo or CopyTo(Stream) method, that why the text will be copied using a sensible small buffer and memory will never need to be allocated to hold all the text at the same time.

If you do need to process the text, read one line at a time, this will result in lots of small strings being created, rather than one very large string. The .net GC is very good at reclaiming small short lived object. A no point should you have the entire log file in memory at the same time. Creating a custom iterator that returns the lines in the file, would be one way of doing it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-06 01:23
We can really only guess. Since you have enough free memory (and non-contiguous virtual address space), the problem is most likely related to not being able to allocate enough contiguous memory. The things that need the most contiguous memory are almost exlusively arrays, like the backing array for your queue. When everything works correctly, address space is compacted regularly (part of the GC) and you maximize available contiguous memory. If this doesn't work, something is preventing compaction from working correctly - for example, pinned handles, like the ones used for I/O.

Why does an explicit GC.Collect() kind of help? It might very well be that you're in a point where all those pinned handles are released, and the compaction actually works. Try using something like VMMap or CLRProfiler to see how the objects are laid out in the address space - the typical case of compaction issues is when you have something like 99% free space in your memory, but nowhere enough to allocate a new object (strings and arrays don't work well with memory fragmentation). Another typical case is when you neglect to use GC.AddMemoryPressure when allocating unmanaged memory (e.g. for the buffers), so the GC has no idea that it should really start collecting already. Again, CLRProfiler is very helpful in watching when GC happens, and how it maps to memory usage.

If memory fragmentation is indeed the problem, you need to figure out why. This is actually somewhat complex, and may require some work with something like WinDbg, which is rather hard to use, to say the least. I/O always means some pinned buffers, so if you're doing a lot of I/O in parallel, you're interfering with the proper functioning of the GC. The GC tries to deal with this by creating multiple heaps (depending on the exact configuration of GC you're running, but looking at your case, server GC should really be what you're using - you are running this on Windows Server, right?), and I've seen hundreds of heaps being created to "fix" the fragmentation issue - but ultimately, this is destined to fail.

If you have to use pinned handles, you really want to allocate them once, and reuse them if possible. Pinning is preventing the GC from doing its job, so you should only pin stuff that doesn't need to be moved in memory (large object heap objects, pre-allocated buffers on the bottom of the heap...), or at least pin for as short a time as possible.

In general, reusing buffers is a good idea. Sadly, that means that you want to avoid strings and similar constructs in code like this - strings being immutable means that every single line you read needs to be a separately allocated object. Fortunately, you don't necessarily need to deal with strings in your case - a simple byte[] buffer will work just as well - just look for 0x13, 0x10 instead of "\r\n". The main problem you have is that you need to hold a lot of data in memory at once - you either need to minimize that, or make sure the buffers are allocated where they're used the best; for file data, a LOH buffer will help quite a bit.

One way to avoid so many allocations would be to parse the file looking for end-of-lines and remembering just the offset of the line where you want to start copying. As you go, line-by-line (using the reusable byte[] buffer), you'll just update the offset of the "at most 100 000th line from the end" rather than allocating and freeing strings. Of course, this does mean you have to read some of the data twice - that's just the price of dealing with data that isn't fixed-length and/or indexed :)

Another approach is to read the file from the end. How well this works is hard to predict, since it depends a lot on how the OS and filesystem are capable of handling backwards reading. In some cases, it's just as good as forward reading - both are sequential reads, it's just about whether the OS/FS is smart enough to figure that out or not. In some cases, it's going to be very expensive - if that's the case, use large file buffers (e.g. 16 MiB instead of the more customary 4 kiB etc.) to squeeze as sequential reads as possible. Counting from the back still doesn't quite allow you to stream the data directly to another file (you'll either need to combine this with the first approach or keep the whole 100 000 lines in memory at once yet again), but it means you only ever read the data you're going to use (the most you over-read is the size of your buffer).

Finally, if all else fails, you could use unmanaged memory for some of the work you're doing. I hope I don't have to say this is much trickier than using managed memory - you'll have to be very careful about proper addressing and bounds-checking, among other things. For a task like yours, it's still quite manageable - ultimately, you're just moving lots of bytes with very little "work". You better understand the unmanaged world well, though - otherwise it's just going to lead to bugs that are very hard to track and fix.

EDIT:

Since you made it clear that the "last 100k items" is a workaround and not the desired solution, the easiest thing is to simply stream the data rather than reading everything to RAM and writing everything in one go. If File.Copy/File.Move aren't good enough for you, you can use something like this:
```
var buffer = new byte[4096];
using (var sourceFile = File.OpenRead(...))
using (var targetFile = File.Create(...))
{
  var bytesRead = sourceFile.Read(buffer, 0, buffer.Length);
  if (bytesRead == 0) break;

  targetFile.Write(buffer, 0, bytesRead);
}
```
The only memory you need is for the (relatively small) buffers.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2021-01-06 01:29
The garbage collector usually collects managed objects which are not used and referenced anymore automatically. Usually there is no need to (or should not) call GC.Collect() method manually. But for example (in this case) when you are calling:
```
 queue.Dequeue(item)... 
```
In a long loop, there is no pointer or variable pointing to removed object, but because it is still within method scope the garbage collector does not collect it until memory becomes very low. You can call it manually if you are in critical situation like this.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2021-01-06 01:30

A reasonable workaround to calling GC.collect would be to create a new MemoryFailPoint before the critical code section.

This of course does not solve the the real problem, why the GC in your case did not collect the memory by itself.

In your case you know how much memory you will need (the file size), so by creating a new MemoryFailPoint with that size, you can be reasonably certain the memory will be available. MemoryFailPoint actually calls GC.Collect itself, if it decides it is neccessary, but it also has some additional logic deal with other issues such as page file size or address space fragmentation.

And if the memory is not enough, you avoid an OutOfMemoryException with its potential corrupting side effects, and instead get an InsufficientMemoryException, which can be caught without worries.

0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2021-01-06 01:32

From the MSDN Page for OutOfMemoryException here is the major cause of an OutOfMemoryException:

The common language runtime cannot allocate enough contiguous memory to successfully perform an operation. This exception can be thrown by any property assignment or method call that requires a memory allocation. For more information on the cause of the OutOfMemoryException exception, see "Out of Memory" Does Not Refer to Physical Memory.

This type of OutOfMemoryException exception represents a catastrophic failure. If you choose to handle the exception, you should include a catch block that calls the Environment.FailFast method to terminate your app and add an entry to the system event log...

The key point is that it represents a catastrophic failure and you should exit your app.

Calling GC.Collect() is no longer an option once this type of exception occurs.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题