We have a bug in our application that does not occur every time and therefore we don\'t know its \"logic\". I don\'t even get it reproduced in 100 times today.
Disclaime
hire some testers!
Use an enhanced crash reporter. In the Delphi environment, we have EurekaLog and MadExcept. Other tools exist in other environments. Or you can diagnose the core dump. You're looking for the stack trace, which will show you where it's blowing up, how it got there, what's in memory, etc.. It's also useful to have a screenshot of the app, if it's a user-interaction thing. And info about the machine that it crashed on (OS version and patch, what else is running at the time, etc..) Both of the tools that I mentioned can do this.
If it's something that happens with a few users but you can't reproduce it, and they can, go sit with them and watch. If it's not apparent, switch seats - you "drive", and they tell you what to do. You'll uncover the subtle usability issues that way. double-clicks on a single-click button, for example, initiating re-entrancy in the OnClick event. That sort of thing. If the users are remote, use WebEx, Wink, etc., to record them crashing it, so you can analyze the playback.
Add pre and post condition check in methods related to this bug.
You may have a look at Design by contract
Read the stack trace carefully and try to guess what could be happened; then try to trace\log every line of code that potentially can cause trouble.
Keep your focus on disposing resources; many sneaky sporadical bugs i found were related to close\dispose things :).
Analyze the problem in a pair and pair-read the code. Make notes of the problems you KNOW to be true and try to assert which logical preconditions must hold true for this happen. Follow the evidence like a CSI.
Most people instinctively say "add more logging", and this may be a solution. But for a lot of problems this just makes things worse, since logging can change timing-dependencies sufficiently to make the problem more or less frequent. Changing the frequency from 1 in 1000 to 1 in 1,000,000 will not bring you closer to the true source of the problem.
So if your logical reasoning does not solve the problem, it'll probably give you a few specifics you could investigate with logging or assertions in your code.
Since this is language-agnostic, I'll mention a few axioms of debugging.
Nothing a computer ever does is random. A 'random occurrence' indicates a as-yet-undiscovered pattern. Debugging begins with isolating the pattern. Vary individual elements and assess what makes a change in the behaviour of the bug.
Different user, same computer? Same user, different computer? Is the occurrence strongly periodic? Does rebooting change the periodicity?
FYI- I once saw a bug that was experienced by a single person. I literally mean person, not a user account. User A would never see the problem on their system, User B would sit down at that workstation, signed on as User A and could immediately reproduce the bug. There should be no conceivable way for the app to know the difference between the physical body in the chair. However-
The users used the app in different ways. User A habitually used a hotkey to to invoke a action and User B used an on-screen control. The difference in the user behaviour would cascade into a visible error a few actions later.
ANY difference that effects the behaviour of the bug should be investigated, even if it makes no sense.