I hear talks of C++14 introducing a garbage collector in the C++ standard library itself. What is the rationale behind this feature? Isn\'t this the reason that RAII exists in
Definitions:
RCB GC: Reference-Counting Based GC.
MSB GC: Mark-Sweep Based GC.
Quick Answer:
MSB GC should be added into the C++ standard, because it is more handy than RCB GC in certain cases.
Two illustrative examples:
Consider a global buffer whose initial size is small, and any thread can dynamically enlarge its size and keep the old contents accessible for other threads.
Implementation 1 (MSB GC Version):
int* g_buf = 0;
size_t g_current_buf_size = 1024;
void InitializeGlobalBuffer()
{
g_buf = gcnew int[g_current_buf_size];
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = gcnew int[new_size];
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
}
}
Implementation 2 (RCB GC Version):
std::shared_ptr<int> g_buf;
size_t g_current_buf_size = 1024;
std::shared_ptr<int> NewBuffer(size_t size)
{
return std::shared_ptr<int>(new int[size], []( int *p ) { delete[] p; });
}
void InitializeGlobalBuffer()
{
g_buf = NewBuffer(g_current_buf_size);
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = NewBuffer(new_size);
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
//
// Now tmp_buf owns the old g_buf, when tmp_buf is destructed,
// the old g_buf will also be deleted.
//
}
}
PLEASE NOTE:
After calling std::swap(tmp_buf, g_buf);
, tmp_buf
owns the old g_buf
. When tmp_buf
is destructed, the old g_buf
will also be deleted.
If another thread is calling GetValueFromGlobalBuffer(index);
to fetch the value from the old g_buf
, then A Race Hazard Will Occur!!!
So, though implementation 2 looks as elegant as implementation 1, it doesn't work!
If we want to make implementation 2 work correctly, we must add some kind of lock-mechanism; then it will be not only slower, but less elegant than implementaion 1.
Conclusion:
It is good to take MSB GC into the C++ standard as an optional feature.
None of the answers so far touch upon the most important benefit of adding garbage-collection to a language: In the absence of language-supported garbage-collection, it's almost impossible to guarantee that no object will be destroyed while references to it exist. Worse, if such a thing does happen, it's almost impossible to guarantee that a later attempt to use the reference won't end up manipulating some other random object.
Although there are many kinds of objects whose lifetimes can be much better managed by RAII than by a garbage collector, there's considerable value in having the GC manage nearly all objects, including those whose lifetime is controlled by RAII. An object's destructor should kill the object and make it useless, but leave the corpse behind for the GC. Any reference to the object will thus become a reference to the corpse, and will remain one until it (the reference) ceases to exist entirely. Only when all references to the corpse have ceased to exist will the corpse itself do so.
While there are ways of implementing garbage collectors without inherent language support, such implementations either require that the GC be informed any time references are created or destroyed (adding considerable hassle and overhead), or run the risk that a reference the GC doesn't know about might exist to an object which is otherwise unreferenced. Compiler support for GC eliminates both those problems.
Garbage collection and RAII are useful in different contexts. The presence of GC should not affect your use of RAII. Since RAII is well-known, I give two examples where GC is handy.
Garbage collection would be a great help in implementing lock-free data structures.
[...] it turns out that deterministic memory freeing is quite a fundamental problem in lock-free data structures. (from Lock-Free Data Structures By Andrei Alexandrescu)
Basically the problem is that you have to make sure you are not deallocating the memory while a thread is reading it. That's where GC becomes handy: It can look at the threads and only do the deallocation when it is safe. Please read the article for details.
Just to be clear here: it doesn't mean that the WHOLE WORLD should be garbage collected as in Java; only the relevant data should be garbage collected accurately.
In one of his presentations, Bjarne Stroustrup also gave a good, valid example where GC becomes handy. Imagine an application written in C/C++, 10M SLOC in size. The application works reasonably well (fairly bug free) but it leaks. You neither have the resources (man hours) nor the functional knowledge to fix this. The source code is a somewhat messy legacy code. What do you do? I agree that it is perhaps the easiest and cheapest way to sweep the problem under the rug with GC.
As it has been pointed out by sasha.sochka, the garbage collector will be optional.
My personal concern is that people would start using GC like it is used in Java and would write sloppy code and garbage collect everything. (I have the impression that shared_ptr
has already become the default 'go to' even in cases where unique_ptr
or, hell, stack allocation would do it.)
I agree with @DeadMG that there is no GC in current C++ standard but I would like to add the following citation from B. Stroustrup:
When (not if) automatic garbage collection becomes part of C++, it will be optional
So Bjarne is sure that it will be added in future. At least the chairman of the EWG (Evolution Working Group) and one of the most important committee members (and more importantly language creator) wants to add it.
Unless he changed his opinion we can expect it to be added and implemented in the future.
There are some algorithms which are complicated/inefficient/impossible to write without a GC. I suspect this is the major selling point for GC in C++, and can't ever see it being used as a general-purpose allocator.
Why not a general-purpose allocator?
First, We have RAII, and most (including me) seem to believe that this is a superior method of resource management. We like determinism because it makes writing robust, leak-free code a lot simpler and makes performance predictable.
Second, you'll need to place some very un-C++-like restrictions on how you can use memory. For instance, you'd need at least one reachable, un-obfuscated pointer. Obfuscated pointers, as are popular in common tree container libraries (using alignment-guaranteed low bits for color flags) among others, won't be recognizable by the GC.
Related to that, the things which make modern GCs so usable are going to be very difficult to apply to C++ if you support any number of obfuscated pointers. Generational defragmenting GCs are really cool, because allocating is extremely cheap (essentially just incrementing a pointer) and eventually your allocations get compacted into something smaller with improved locality. To do this, objects need to be movable.
To make an object safely movable, the GC needs to be able to update all the pointers to it. It won't be able to find obfuscated ones. This could be accomodated, but wouldn't be pretty (probably a gc_pin
type or similar, used like current std::lock_guard
, which is used whenever you need a raw pointer). Usability would be out the door.
Without making things movable, a GC would be significantly slower and less scalable than what you're used to elsewhere.
Usability reasons (resource management) and efficiency reasons (fast, movable allocations) out of the way, what else is GC good for? Certainly not general-purpose. Enter lock-free algorithms.
Why lock-free?
Lock-free algorithms work by letting an operation under contention go temporarily "out of sync" with the data structure and detecting/correcting this at a later step. One effect of this is that under contention memory might be accessed after it has been deleted. For example, if you have multiple threads competing to pop a node from a LIFO, it is possible for one thread to pop and delete the node before another thread has realized the node was already taken:
Thread A:
Thread B:
Thread A:
Thread B:
With GC you can avoid the possibility of reading from uncommitted memory because the node would never be deleted while Thread B is referencing it. There are ways around this, such as hazard pointers or catching SEH exceptions on Windows, but these can hurt performance significantly. GC tends to be the most optimal solution here.
GC has the following advantages:
I am not saying that GC is the best/good choice. I am just saying that it has different characteristics. In some scenarios that might be an advantage.