We have an application that is running on 5 (server) nodes (16 cores, 128 GB Memory each) that loads almost 70 GB data on each machine. This application is distributed and serve
Note that while an event handler is subscribed, the publisher of the event holds a reference to the subscriber. This is a common cause of memory leaks in .NET, and in your case it would not be a serious leak but if a managed object is keeping pointer or handle to an unmanaged object then it is not deleting this unmanaged object and so causing memory fragmentation.
If you are sure that the reason for fragmentation is the unmanaged component and that you are not missing something, and if you have access to the code of the umnanaged component you can recompile it and link it using a decent memory allocator like hoard. But this should be done when there is nothing else to do and after serious profiling.
An educated guess without seeing your code is that you have an issue with STA deadlocking on finalisation, especially seeing as though it's a high concurrency system judging by your hefty hardware requirements. Anyway seeing as though you've tried forcing GC a deadlock makes sense, if the finalisation is deadlocked then the GC isn't going to be able to do its job. Hope this helps you.
Advanced Techniques to Prevent and Detect Deadlocks in .Net Applications
Specifically the section that is of interest is as I've quoted below
When your code is executing on a single-threaded apartment (STA) thread, the equivalent of an exclusive lock occurs. Only one thread can update a GUI window or run code inside an Apartment-threaded COM component inside an STA at once. Such threads own a message queue into which to-be-processed information is placed by the system and other parts of the application. GUIs use this queue for information such as repaint requests, device input to be processed, and window close requests. COM proxies use the message queue to transitioning cross-Apartment method calls into the apartment for which a component has affinity. All code running on an STA is responsible for pumping the message queue—looking for and processing new messages using the message loop—otherwise the queue can become clogged, leading to lost responsiveness. In Win32 terms, this means using the MsgWaitForSingleObject, MsgWaitForMultipleObjects (and their Ex counterparts), or CoWaitForMultipleHandles APIs. A non-pumping wait such as WaitForSingleObject or WaitForMultipleObjects (and their Ex counterparts) won't pump incoming messages.
In other words, the STA "lock" can only be released by pumping the message queue. Applications that perform operations whose performance characteristics vary greatly on the GUI thread without pumping for messages, like those noted earlier, can easily deadlock. Well-written programs either schedule such long-running work to occur elsewhere, or pump for messages each time they block to avoid this problem. Thankfully, the CLR pumps for you whenever you block in managed code (via a call to a contentious Monitor.Enter, WaitHandle.WaitOne, FileStream.EndRead, Thread.Join, and so forth), helping to mitigate this problem. But plenty of code—and even some fraction of the .NET Framework itself—ends up blocking in unmanaged code, in which case a pumping wait may or may not have been added by the author of the blocking code.
Here's a classic example of an STA-induced deadlock. A thread running in an STA generates a large quantity of Apartment threaded COM component instances and, implicitly, their corresponding Runtime Callable Wrappers (RCWs). Of course, these RCWs must be finalized by the CLR when they become unreachable, or they will leak. But the CLR's finalizer thread always joins the process's Multithreaded Apartment (MTA), meaning it must use a proxy that transitions to the STA in order to call Release on the RCWs. If the STA isn't pumping to receive the finalizer's attempt to invoke the Finalize method on a given RCW—perhaps because it has chosen to block using a non-pumping wait—the finalizer thread will be stuck. It is blocked until the STA unblocks and pumps. If the STA never pumps, the finalizer thread will never make any progress, and a slow, silent build-up of all finalizable resources will occur over time. This can, in turn, lead to a subsequent out-of-memory crash or a process-recycle in ASP.NET. Clearly, both outcomes are unsatisfactory. High-level frameworks like Windows Forms, Windows Presentation Foundation, and COM hide much of the complexity of STAs, but they can still fail in unpredictable ways, including deadlocking. COM synchronization contexts introduce similar, but subtly different, challenges. And furthermore, many of these failures will only occur in a small fraction of test runs and often only under high stress.
In .NET 4.5, the CLR team enhanced large object heap (LOH) allocation. Even then, they still recommend object pooling to help large object performance. It sounds like LOH fragmentation happens less often in 4.5, but it could still happen. But from the stack trace, it looks unrelated to the LOH.
Daniel Lane suggested GC deadlocks. We have seen those happen on production systems, too, and they definitely cause issues with process size and out of memory conditions.
One thing you could do is run Debug Diagnostics Tool, capture a full dump when the OutOfMemoryException occurs, and then have the tool analyze the dump for crash and memory information. I've seen some interesting things happen with both native and managed heaps from this report. For example, we found a printer driver had allocated 1 GB of unmanaged heap on a 32-bit system. Updating the driver fixed the issue. Granted, that was a client system, but something similar could be happening to your server.
I agree that this sounds like a native mode error. Looking at the implementation of System.Threading.Monitor.Wait
, ObjWait
, PulseAll
, and ObjPulseAll
from the .NET 4.5 Reference Code reveals these classes are calling native methods:
/*========================================================================
** Sends a notification to all waiting objects.
========================================================================*/
[System.Security.SecurityCritical] // auto-generated
[ResourceExposure(ResourceScope.None)]
[MethodImplAttribute(MethodImplOptions.InternalCall)]
private static extern void ObjPulseAll(Object obj);
[System.Security.SecuritySafeCritical] // auto-generated
public static void PulseAll(Object obj)
{
if (obj == null)
{
throw new ArgumentNullException("obj");
}
Contract.EndContractBlock();
ObjPulseAll(obj);
}
A comment on Raymond Chen's article about "PulseEvent is fundamentally flawed" by "Mike Dimmick" says:
Monitor.PulseAll is a wrapper around Monitor.ObjPulseAll, which is an internal call to the CLR internal function ObjectNative::PulseAll. This in turn wraps ObjHeader::PulseAll, which wraps SyncBlock::PulseAll. This simply sits in a loop calling SetEvent until no more threads are waiting on the object.
If anyone has access to the source code for the CLI, maybe they could post more about this function and what the memory error could be coming from.
The GC doesn't take into account the unmanaged heap. If you are creating lots of objects that are merely wrappers in C# to larger unmanaged memory then your memory is being devoured but the GC can't make rational decisions based on this as it only see the managed heap.
You end up in a situation where the GC collector doesn't think you are short of memory because most of the things on your gen 1 heap are 8 byte references where in actual fact they are like icebergs at sea. Most of the memory is below!
You can make use of these GC calls:
System::GC::AddMemoryPressure(sizeOfField);
System::GC::RemoveMemoryPressure(sizeOfField);
These methods allow the garbage collector to see the unmanaged memory (if you provide it the right figures)
If it is a fragmentation problem then you cannot solve it without some sort of profiling. Search for a memory profiler that supports fragmentation detection to know exactly the cause of this fragmentation.
GC.Collect()
will only free memory where an object is not referenced by anything else.
A common scenario where a leak can occur is by not disconnecting an event handler from an object before setting it's reference to null.
As an exercise in avoiding leaks, it's a good idea to implement IDisposable
on objects (even tho' it's meant for releasing unmanaged objects), simply from the point of view of ensuring that all handlers are disconnected, collections are cleared correctly and any other object references are set to null.