How to prove there is IO pending behind the thread pool worker thread?

问题

One of application have hundreds of thread pool threads pending with the call stack below after a network down;

BTW, before the network down, there is only dozens of threads, however after the network down, the number of threads increased to around 400 hundreds in very short time, and keeping that number unchanged for a very long time until we reboot the server.

00000020`a864fc58 00007fff`d4ea1118 ntdll!NtWaitForSingleObject+0xa
00000020`a864fc60 00007fff`ce50ce66 KERNELBASE!WaitForSingleObjectEx+0x94
00000020`a864fd00 00007fff`ce50d247 clr!CLRSemaphore::Wait+0x8a
00000020`a864fdc0 00007fff`ce50d330 
clr!ThreadpoolMgr::UnfairSemaphore::Wait+0x109
00000020`a864fe00 00007fff`ce5de8b6 
clr!ThreadpoolMgr::WorkerThreadStart+0x1b9
00000020`a864fea0 00007fff`d60613d2 clr!Thread::intermediateThreadProc+0x7d
00000020`a864fee0 00007fff`d7be5454 kernel32!BaseThreadInitThunk+0x22
00000020`a864ff10 00000000`00000000 ntdll!RtlUserThreadStart+0x34

1, So why the threads increased to hundreds just after a network down , what's the possibilities?

Also in my understanding(also from the source code below !WorkerSemaphore->Wait...) the threads would be destroyed\exit after 20 seconds if no task assign to them From the call stack they were just waiting for the work and no task assigned to them , however those threads never got destroyed; After checking the source code of CoreCLR, I noticed that the thread wouldn't be destroyed even after the timeout 20 seconds if there is IO pending .

2,So the question is how to check if there is IO pending for these threads by checking the user dump through winDBG?

https://github.com/dotnet/coreclr/blob/master/src/vm/win32threadpool.cpp

RetryWaitForWork:
if (!WorkerSemaphore->Wait(AppX::IsAppXProcess() ? WorkerTimeoutAppX : WorkerTimeout))
{
    if (!IsIoPending())
    {      
        DangerousNonHostedSpinLockHolder tal(&ThreadAdjustmentLock);

        counts = WorkerCounter.GetCleanCounts();
        while (true)
        {
            if (counts.NumActive == counts.NumWorking)
            {
                goto RetryWaitForWork;
            }

            newCounts = counts;
            newCounts.NumActive--;

            // if we timed out while active, then Hill Climbing needs to be told that we need fewer threads
            newCounts.MaxWorking = max(MinLimitTotalWorkerThreads, min(newCounts.NumActive, newCounts.MaxWorking));

            oldCounts = WorkerCounter.CompareExchangeCounts(newCounts, counts);

            if (oldCounts == counts)
            {
                HillClimbingInstance.ForceChange(newCounts.MaxWorking, ThreadTimedOut);
                goto Exit;
            }

            counts = oldCounts;
        }
    }
    else
    {
        goto RetryWaitForWork;
    }

BTW , these threads just created when the network down happened and lived for over 10 hours from the !runaway 0x4.

I just need a way to figure out why these threads never got destroyed even no work assigned to them over 20 seconds, the only reason I can figure out is there are IO pending behind those threads, however how to prove it?

来源：https://stackoverflow.com/questions/44617106/how-to-prove-there-is-io-pending-behind-the-thread-pool-worker-thread

标签

debugging

windbg

coreclr