I am getting random crashes on my C++ application, it may not crash for a month, and then crash 10 times in a hour, and sometimes it may crash on launch, while sometimes it may
The first thing I would do is debug the core dump with gdb (both Windows and Linux). The second would be be running a program like Lint, Prefast (Windows), Clang Analyzer or some other static analysis program (be prepared for a lot of false positives). Third thing would be some kind of runtime check, like Valgrind (or its close variants), Microsoft Application Verifier, or Google Perftools.
And logging. Which doesn't have to go to disk. You could, for instance, log to a global std::list<std::string>
which would be pruned to the last 100 entries. When an exception is caught display the contents of that list.
That sounds like something tricky like a race condition.
I'd suggest you create a debug build and use that. You should also make sure that a core dump is created when the program crashes.
The next time the program crashes, you can launch gdb on the coredump and see where the problem lies. It'll probably be a consecutive fault, but this should get you started.
Run the application on Linux under valgrind
to look for memory errors. Random crashes are usually down to corrupting memory.
Fix every error you find with valgrind's memcheck tool, and then hopefully the crash will go away.
If the whole program takes too long to run under valgrind, then split off functionality into unit tests, and run those under valgrind, hopefully you'll find the memory errors that are causing the problems.
If it doesn't then make sure coredumps are enabled (ulimit -a
) and then when it crashes you'll be able to find out where with gdb
.
If your application is not Windows specific, you may try compiling and running your program on other platforms such as Linux (different distribution, 32/64 bits, ... if you've the luxury). That may help trigger the bugs of your program. Of course you should use the tools mentioned in other posts such as gdb, valgrind, etc.
Two more pointers/ideas (besides core dump and valgrind on Linux):
1) Try Nokia's "Qt Creator". It supports mingw and can act as post-mortem debugger.
2) If it's feasible, maybe just run the application in in gdb constantly?
These sorts of bugs are always tricky - unless you can reproduce the error then your only option is to make changes to your application so that extra information is logged, and then wait until the error happens again in the wild.
There is an excellent tool called Process Dumper that you can use to obtain a crash dump of a process that experiences an exception or exits unexpectedly - you could ask users to install that and configure rules for your application.
Alternatively if you don't want to ask users to install other applications you could have your application monitor for exceptions and create a dump itself by calling MiniDumpWriteDump.
The other option is to improve the logging, however figuring out what information to log (without just logging everything) can be tricky, and so it can take several iterations of crash - change logging to hunt down the problem.
As I said, these sorts of bugs are always tricky to diagnose - in my experience it generally involves hours and hours of peering through logs and crash dumps until suddenly you get that eureka moment where everything makes sense - the key is collecting the right information.