We have an application that is running on 5 (server) nodes (16 cores, 128 GB Memory each) that loads almost 70 GB data on each machine. This application is distributed and serve
In .NET 4.5, the CLR team enhanced large object heap (LOH) allocation. Even then, they still recommend object pooling to help large object performance. It sounds like LOH fragmentation happens less often in 4.5, but it could still happen. But from the stack trace, it looks unrelated to the LOH.
Daniel Lane suggested GC deadlocks. We have seen those happen on production systems, too, and they definitely cause issues with process size and out of memory conditions.
One thing you could do is run Debug Diagnostics Tool, capture a full dump when the OutOfMemoryException occurs, and then have the tool analyze the dump for crash and memory information. I've seen some interesting things happen with both native and managed heaps from this report. For example, we found a printer driver had allocated 1 GB of unmanaged heap on a 32-bit system. Updating the driver fixed the issue. Granted, that was a client system, but something similar could be happening to your server.
I agree that this sounds like a native mode error. Looking at the implementation of System.Threading.Monitor.Wait
, ObjWait
, PulseAll
, and ObjPulseAll
from the .NET 4.5 Reference Code reveals these classes are calling native methods:
/*========================================================================
** Sends a notification to all waiting objects.
========================================================================*/
[System.Security.SecurityCritical] // auto-generated
[ResourceExposure(ResourceScope.None)]
[MethodImplAttribute(MethodImplOptions.InternalCall)]
private static extern void ObjPulseAll(Object obj);
[System.Security.SecuritySafeCritical] // auto-generated
public static void PulseAll(Object obj)
{
if (obj == null)
{
throw new ArgumentNullException("obj");
}
Contract.EndContractBlock();
ObjPulseAll(obj);
}
A comment on Raymond Chen's article about "PulseEvent is fundamentally flawed" by "Mike Dimmick" says:
Monitor.PulseAll is a wrapper around Monitor.ObjPulseAll, which is an internal call to the CLR internal function ObjectNative::PulseAll. This in turn wraps ObjHeader::PulseAll, which wraps SyncBlock::PulseAll. This simply sits in a loop calling SetEvent until no more threads are waiting on the object.
If anyone has access to the source code for the CLI, maybe they could post more about this function and what the memory error could be coming from.