I keep hearing people complaining that C++ doesn\'t have garbage collection. I also hear that the C++ Standards Committee is looking at adding it to the language. I\'m afrai
using RAII with smart pointers eliminates the need for it, right?
Smart pointers can be used to implement reference counting in C++ which is a form of garbage collection (automatic memory management) but production GCs no longer use reference counting because it has some important deficiencies:
Reference counting leaks cycles. Consider A↔B, both objects A and B refer to each other so they both have a reference count of 1 and neither is collected but they should both be reclaimed. Advanced algorithms like trial deletion solve this problem but add a lot of complexity. Using weak_ptr
as a workaround is falling back to manual memory management.
Naive reference counting is slow for several reasons. Firstly, it requires out-of-cache reference counts to be bumped often (see Boost's shared_ptr up to 10× slower than OCaml's garbage collection). Secondly, destructors injected at the end of scope can incur unnecessary-and-expensive virtual function calls and inhibit optimizations such as tail call elimination.
Scope-based reference counting keeps floating garbage around as objects are not recycled until the end of scope whereas tracing GCs can reclaim them as soon as they become unreachable, e.g. can a local allocated before a loop be reclaimed during the loop?
What advantages could garbage collection offer an experienced C++ developer?
Productivity and reliability are the main benefits. For many applications, manual memory management requires significant programmer effort. By simulating an infinite-memory machine, garbage collection liberates the programmer from this burden which allows them to focus on problem solving and evades some important classes of bugs (dangling pointers, missing free
, double free
). Furthermore, garbage collection facilitates other forms of programming, e.g. by solving the upwards funarg problem (1970).
The short answer is that garbage collection is very similar in principle to RAII with smart pointers. If every piece of memory you ever allocate lies within an object, and that object is only referred to by smart pointers, you have something close to garbage collection (potentially better). The advantage comes from not having to be so judicious about scoping and smart-pointering every object, and letting the runtime do the work for you.
This question seems analogous to "what does C++ have to offer the experienced assembly developer? instructions and subroutines eliminate the need for it, right?"
Garbage collection allows to postpone the decision about who owns an object.
C++ uses value semantics, so with RAII, indeed, objects are recollected when going out of scope. This is sometimes referred to as "immediate GC".
When your program starts using reference-semantics (through smart pointers etc...), the language does no longer support you, you're left to the wit of your smart pointer library.
The tricky thing about GC is deciding upon when an object is no longer needed.
Full-fledged GC that handles things like cyclic references would be somewhat of an upgrade over a ref-counted shared_ptr
. I would somewhat welcome it in C++, but not at the language level.
One of the beauties about C++ is that it doesn't force garbage collection on you.
I want to correct a common misconception: a garbage collection myth that it somehow eliminates leaks. From my experience, the worst nightmares of debugging code written by others and trying to spot the most expensive logical leaks involved garbage collection with languages like embedded Python through a resource-intensive host application.
When talking about subjects like GC, there's theory and then there's practice. In theory it's wonderful and prevents leaks. Yet at the theoretical level, so is every language wonderful and leak-free since in theory, everyone would write perfectly correct code and test every single possible case where a single piece of code could go wrong.
Garbage collection combined with less-than-ideal team collaboration caused the worst, hardest-to-debug leaks in our case.
The problem still has to do with ownership of resources. You have to make clear design decisions here when persistent objects are involved, and garbage collection makes it all too easy to think that you don't.
Given some resource, R
, in a team environment where the developers aren't constantly communicating and reviewing each other's code carefully at alll times (something a little too common in my experience), it becomes quite easy for developer A
to store a handle to that resource. Developer B
does as well, perhaps in an obscure way that indirectly adds R
to some data structure. So does C
. In a garbage-collected system, this has created 3 owners of R
.
Because developer A
was the one that created the resource originally and thinks he's the owner of it, he remembers to release the reference to R
when the user indicates that he no longer wants to use it. After all, if he fails to do so, nothing would happen and it would be obvious from testing that the user-end removal logic did nothing. So he remembers to release it, as any reasonably competent developer would do. This triggers an event for which B
handles it and also remembers to release the reference to R
.
However, C
forgets. He's not one of the stronger developers on the team: a somewhat fresh recruit who has only worked in the system for a year. Or maybe he's not even on the team, just a popular third party developer writing plugins for our product that many users add to the software. With garbage collection, this is when we get those silent logical resource leaks. They're the worst kind: they do not necessarily manifest in the user-visible side of the software as an obvious bug besides the fact that over durations of running the program, the memory usage just continues to rise and rise for some mysterious purpose. Trying to narrow down these issues with a debugger can be about as fun as debugging a time-sensitive race condition.
Without garbage collection, developer C
would have created a dangling pointer. He may try to access it at some point and cause the software to crash. Now that's a testing/user-visible bug. C
gets embarrassed a bit and corrects his bug. In the GC scenario, just trying to figure out where the system is leaking may be so difficult that some of the leaks are never corrected. These are not valgrind
-type physical leaks that can be detected easily and pinpointed to a specific line of code.
With garbage collection, developer C
has created a very mysterious leak. His code may continue to access R
which is now just some invisible entity in the software, irrelevant to the user at this point, but still in a valid state. And as C
's code creates more leaks, he's creating more hidden processing on irrelevant resources, and the software is not only leaking memory but also getting slower and slower each time.
So garbage collection does not necessarily mitigate logical resource leaks. It can, in less than ideal scenarios, make leaks far easier to silently go unnoticed and remain in the software. The developers might get so frustrated trying to trace down their GC logical leaks that they simply tell their users to restart the software periodically as a workaround. It does eliminate dangling pointers, and in a safety-obsessed software where crashing is completely unacceptable under any scenario, then I would prefer GC. But I'm often working in less safety-critical but resource-intensive, performance-critical products where a crash that can be fixed promptly is preferable to a really obscure and mysterious silent bug, and resource leaks are not trivial bugs there.
In both of these cases, we're talking about persistent objects not residing on the stack, like a scene graph in a 3D software or the video clips available in a compositor or the enemies in a game world. When resources tie their lifetimes to the stack, both C++ and any other GC language tend to make it trivial to manage resources properly. The real difficulty lies in persistent resources referencing other resources.
In C or C++, you can have dangling pointers and crashes resulting from segfaults if you fail to clearly designate who owns a resource and when handles to them should be released (ex: set to null in response to an event). Yet in GC, that loud and obnoxious but often easy-to-spot crash is exchanged for a silent resource leak that may never be detected.