问题
I stucked into a problem with threads syncronization and critical sections on Windows 10.
Application will crash in this case:
- Application has two threads.
- Thread 1 calls EnterCriticalSection with object m_CS
- Thread 2 then attempts to enter the same critical section
- Thread 1 terminates Thread 2 using TerminateThread
- Thread 1 calls LeaveCriticalSection
In previous Windows versions which I was able to test (7, 8, 8.1) this works properly. Thread 2 terminates, and Thread 1 leaves the critical section without exception.
On Windows 10, when Thread 1 leaves the critical section, application crashes with Access Violation. It only happens when another thread was terminated while waiting on EnterCriticalThread.
Looking at the stack trace it looks this (latest frame at the top):
RtlpWakeByAddress
RtlpUnWaitCriticalSection
RtlLeaveCriticalSection
I spent so much time on debugging this issue. In my case m_CS is totally fine when LeaveCriticalSection was called. I debugged and spent some time to analyze disassembled code of ntdll.dll functions. Seems like object corrupts somewhere during execution of RtlpUnWaitCriticalSection and then passed to RtlpWakeByAddress when crash occurs. Basicly ntdll.dll was able to modify CRITICAL_SECTION object's properties such as lock count in RtlLeaveCriticalSection.
From the web I didn't find any answer on this or statement what changed in Windows 10. Only thread on reddit and ~1800 crash reports for Mozilla Firefox with same call stack in the last month. I contacted with author of post on reddit and he was not able to fix this thus far.
So anybody dealed with this issue and may be have a fix for it or advices? As a solution right now I only see to rethink usage of WinAPI TerminateThread and try to avoid it as much as possible. Another way probably to do a code refactoring and think on application's architecture.
Any response appreciated. Thanks in advance
回答1:
Implementation of CRITICAL_SECTION
very volatile from version to version. when in last Windows version thread begin wait on CRITICAL_SECTION
he call WaitOnAddress function. ok, really it ntdll internal implementation - RtlpWaitOnAddress
, but this not change gist. this function internal call RtlpAddWaitBlockToWaitList
- and here the key point - WaitBlock is allocated on thread stack and pointer to this wait block is added to List. then when owner of CRITICAL_SECTION
leave he call WakeByAddressSingle (really it internal implementation RtlpWakeByAddress
) and this function pop the first WaitBlock from List, extract thread Id from it and call NtAlertThreadByThreadId
(new api from win 8.1) - for awaken some thread waited in EnterCriticalSection
. but when you terminated thread, waited in EnterCriticalSection
- his stack is deallocated. so address of WaitBlock block become invalid. so thread which called RtlpWakeByAddres
s (as part of LeaveCriticalSection
) got access violation when try read thread Id from WaitBlock (died thread stack).
conclusion - if you call TerminatedThread
- process already become in unstable state, bug can be at any time and any point. so - not call this function, especially from self process.
回答2:
Thread 1 terminates Thread 2 using TerminateThread
Don't do that. It may look like it works on other windows versions, but there's no way for you to know for sure what side-effects are occurring and hiding from you.
From https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717(v=vs.85).aspx
TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
- If the target thread owns a critical section, the critical section will not be released.
- If the target thread is allocating memory from the heap, the heap lock will not be released.
- If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
- If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL.
What you should do is communicate with thread 2 and let thread 2 shut itself down correctly and safely.
回答3:
I would change the code of thread 2 to use TryEnterCriticalSection
if(!TryEnterCriticalSection(&m_CS)) {
return 0; // Terminate thread
}
//code
LeaveCriticalSection(&m_CS);
This has the advantage that thread 2 is not waiting on the critical section, and it can terminate itself properly. It is generally not advisable to use TerminateThread
, as already mentioned by others in the comments.
来源:https://stackoverflow.com/questions/39635817/windows-10-specific-crash-on-call-leavecriticalsection