I use logging as a way to narrow down issues that do not reproduce in our unit tests let alone repeating the same steps provided by the user: those rare glitches that only show up on some very remote hardware (and sometimes, albeit very rarely, even caused by a driver or third party library glitch outside of our control).
I agree with the comment that this should all be caught by our testing procedure, but it's difficult to find a million+ LOC codebase that demands very low-level, performance-critical code to ever meet those requirements. I don't work in mission-critical software but I work in the graphics industry where we're often having to do everything from implementing memory allocators to utilizing GPU code to SIMD.
Even with very modular, loosely-coupled or even completely decoupled code, the system interactions can lead to very complex inputs and outputs with behavior varying between platforms where occasionally we have that rogue edge case which eludes our tests. Modular black boxes can be very simple but the interactions between them can get very complex and lead to the occassional unanticipated edge case.
As an example of a case where logging saved my butt, one time I had this odd user with a prototype Intel machine that was crashing. We listed the minimum requirements for machines which should support SSE 4, but this particular machine met those minimum requirements and still did not support Streaming SIMD extensions past SSE 3 in spite of being a 16-core machine. Discovering that quickly was made possible by looking at his log which showed precisely the line number where the SSE 4 instructions were used. None of us in our team could reproduce the issue let alone a single other user that participated in verifying the report. Ideally we should have written code for older SIMD versions or at least did some branching and checking to make sure the hardware supported the minimum requirements, but we wanted to make a firm assumption communicated through the minimum hardware requirements for simplicity and economy. Here, perhaps, it's arguable that it was our minimum system requirements that had the "glitch".
Given the way I use logging here, we tend to get fairly large logs. However, the goal is not readability -- what's typically important is the last line of a log sent in with a report when the user experiences a crash of some sort that none of us on the team (let alone few other users in the world) can reproduce.
Nevertheless, one trick I employ regularly to avoid excessive log spamming is that it is often reasonable to assume that a piece of code which executes once successfully will also do so subsequently (not a hard guarantee, but often a reasonable assumption). So I often employ a log_once
kind of function for granular functions to avoid the overhead of paying the cost of logging every time it is called.
I don't sprinkle log outputs all over the place (I might if I had the time). Typically I reserve them most for areas which seem the most dangerous: code invoking GLSL shaders, e.g. (GPU vendors vary wildly here in terms of capability and even how they compile the code), code using SIMD intrinsics, very low-level code, code that inevitably has to rely on OS-specific behavior, low-level code making assumptions about the representation of PODs (ex: code that assumes 8 bits to a byte) -- the kind of cases where we would likewise sprinkle a lot of assertions and sanity checks as well as write the most number of unit tests. Typically this is enough, and logging has saved my butt many times where I would have otherwise taken an unreproducible issue and would have had to take blind stabs at the problem, requiring many iteratons bouncing attempts at a solution to the one user in the world who could reproduce the problem.