exit fails to set error code

问题

I have a C++ Windows program that fails to set the exit code. The program is very complex and I'm currently unable to reproduce this with a simple test case. I do know that the program calls exit(1) because I have a breakpoint on that line. Immediately after I step over it, the debugger (VS2010) prints The program program.exe has exited with code 0 (0x0). When I run it from the shell, %ERRORLEVEL% is also set to 0.

I use subsystem:console and plain old main (no WinMain).

This only happens on Windows Server 2008 R2, not on my Windows 8.1 laptop. I'm running the same executable on both.

I have tried to use exit, _exit, ExitProcess, and return (the offending call is in main), but none of those seem to have any effect. I also have tried to return other codes, also with no result.

There's a similar question but I cannot reproduce the results described in it. My program does use threads.

How can I approach debugging this issue? I'm rather baffled.

回答1:

I have tried to use exit, _exit, ExitProcess, and return

You've eliminated all reasonable explanations, particularly with ExitProcess(). There is only one possibility left, you need to try TerminateProcess(). If that still doesn't set the exit code then you need to shove that machine out of a 4th story window.

But with the expectation that it now works. The difference between ExitProcess() and TerminateProcess() is that the former ensures that all DLLs are notified by the termination. Their DllMain() function gets called with fdwReason = DLL_PROCESS_DETACH. Which gives a DLL the opportunity to do something icky like calling Exit/TerminateProcess() itself, thus screwing up the exit code.

Finding such a DLL can be difficult if you don't have all the source code. Could be an injected one as well, there are entirely too many around these days. Best thing to do is to set a breakpoint on system call so you can catch it in the act, you probably want to do this regardless.

Once you step into main(), use Debug > New Breakpoint > Break at Function and enter {,,ntdll.dll}_NtTerminateProcess@8. Press F5 and the debugger now stops just before the program terminates. Look at the Call Stack to find the evil-doer.

回答2:

Strange symptoms involving exit(), _exit(), ExitProcess(), and others in a multithreaded program - particularly if the symptoms vary between hosts - have a smell of a variable being modified or accessed by different threads, without synchronisation.

Looking at the other thread you linked to, it appears you are using a volatile variable to communicate between threads, but not using any form of synchronisation (for example, code which accesses the value of that variable and code that modifies that value need to cooperate via means of a critical section, mutex, or comparable construct).

That little bit of indirect evidence makes the smell even stronger.

The basic problem I suspect is that declaring a variable as volatile is neither necessary nor sufficient to ensure that variable always has values that will make sense to your program. In particular, it is not sufficient to prevent a thread which is modifying a variable from being preempted when the modification is only partly complete, and for another thread to attempt accessing or modifying the affected variable.

If you look up some articles by Herb Sutter (particularly those concerned with thread synchronisation in his "Guru of the Week" series) you will find detailed explanations of why that is so. Other authors also describe such things, but Sutter's articles are ones that I recall offhand.

The solution is to introduce some means of synchronisation, and for EVERY thread in your program to religiously use it before accessing or modifying variables shared between them. This avoids the various problems (race conditions, operations being preempted partway through) that would cause symptoms like you describe.

Such problems are rarely caught by stepping through with a debugger. The reason for that is that the symptoms are an emergent property. Several unlikely and often independent occurrences, in disparate threads of execution, must occur together. Debuggers do typically change the timing of events in programs, and timing is a critical consideration in the symptoms emerging.

Options include making key variables atomic (so particular operations cannot be preempted), critical sections (where the threads explicitly cooperate within a program), or mutexes (which, depending on definition, allows threads in different programs to explicitly cooperate before accessing shared memory).

Yes, this introduces a bottleneck in your program - a point where every thread must rendezvous and potentially wait for each other. That can affect throughput of your program. Some people advocate using volatile variables to avoid such concerns. More often than not, the result is intermittent symptoms in long running programs like you have described in this question and the "similar question" you linked to.

It doesn't matter whether you use standard means of synchronisation (e.g. introduced in C++11) or windows specific means (WIN API functions). The important thing is that you use a deliberate synchronisation method, rather than just making variables volatile. Different options for synchronisation have different trade-offs, so you will need to make a decision relevant to needs of your program.

Another consideration is to signal all threads so they close cleanly, wait until they are all closed, capture their exit codes, and THEN exit the program. It is often less error prone to do this in the thread running main() - which ultimately starts the process, so is more likely to have access to information it needs to cleanup correctly. If another thread decides the program needs to exit, then better if it communicates that need back to main() to do it.

来源：https://stackoverflow.com/questions/28265707/exit-fails-to-set-error-code

标签

c++

windows

windows-server-2008

exit-code