We have an application that is running on 5 (server) nodes (16 cores, 128 GB Memory each) that loads almost 70 GB data on each machine. This application is distributed and serve
An educated guess without seeing your code is that you have an issue with STA deadlocking on finalisation, especially seeing as though it's a high concurrency system judging by your hefty hardware requirements. Anyway seeing as though you've tried forcing GC a deadlock makes sense, if the finalisation is deadlocked then the GC isn't going to be able to do its job. Hope this helps you.
Advanced Techniques to Prevent and Detect Deadlocks in .Net Applications
Specifically the section that is of interest is as I've quoted below
When your code is executing on a single-threaded apartment (STA) thread, the equivalent of an exclusive lock occurs. Only one thread can update a GUI window or run code inside an Apartment-threaded COM component inside an STA at once. Such threads own a message queue into which to-be-processed information is placed by the system and other parts of the application. GUIs use this queue for information such as repaint requests, device input to be processed, and window close requests. COM proxies use the message queue to transitioning cross-Apartment method calls into the apartment for which a component has affinity. All code running on an STA is responsible for pumping the message queue—looking for and processing new messages using the message loop—otherwise the queue can become clogged, leading to lost responsiveness. In Win32 terms, this means using the MsgWaitForSingleObject, MsgWaitForMultipleObjects (and their Ex counterparts), or CoWaitForMultipleHandles APIs. A non-pumping wait such as WaitForSingleObject or WaitForMultipleObjects (and their Ex counterparts) won't pump incoming messages.
In other words, the STA "lock" can only be released by pumping the message queue. Applications that perform operations whose performance characteristics vary greatly on the GUI thread without pumping for messages, like those noted earlier, can easily deadlock. Well-written programs either schedule such long-running work to occur elsewhere, or pump for messages each time they block to avoid this problem. Thankfully, the CLR pumps for you whenever you block in managed code (via a call to a contentious Monitor.Enter, WaitHandle.WaitOne, FileStream.EndRead, Thread.Join, and so forth), helping to mitigate this problem. But plenty of code—and even some fraction of the .NET Framework itself—ends up blocking in unmanaged code, in which case a pumping wait may or may not have been added by the author of the blocking code.
Here's a classic example of an STA-induced deadlock. A thread running in an STA generates a large quantity of Apartment threaded COM component instances and, implicitly, their corresponding Runtime Callable Wrappers (RCWs). Of course, these RCWs must be finalized by the CLR when they become unreachable, or they will leak. But the CLR's finalizer thread always joins the process's Multithreaded Apartment (MTA), meaning it must use a proxy that transitions to the STA in order to call Release on the RCWs. If the STA isn't pumping to receive the finalizer's attempt to invoke the Finalize method on a given RCW—perhaps because it has chosen to block using a non-pumping wait—the finalizer thread will be stuck. It is blocked until the STA unblocks and pumps. If the STA never pumps, the finalizer thread will never make any progress, and a slow, silent build-up of all finalizable resources will occur over time. This can, in turn, lead to a subsequent out-of-memory crash or a process-recycle in ASP.NET. Clearly, both outcomes are unsatisfactory. High-level frameworks like Windows Forms, Windows Presentation Foundation, and COM hide much of the complexity of STAs, but they can still fail in unpredictable ways, including deadlocking. COM synchronization contexts introduce similar, but subtly different, challenges. And furthermore, many of these failures will only occur in a small fraction of test runs and often only under high stress.