问题
When reading the Mark-Compact chapter on The Garbage Collection Handbook, a sequence of alternatives were presented, but most of them looked old / theoretical (for instance, the 2-finger compaction and the Lisp2 3-pass approach requiring an extra header word per object).
Is anyone aware of what algorithm does HotSpot uses when running Mark-Compact (in its old-generation, I assume)?
Thanks
回答1:
Big disclaimer: I am not a GC expert/writer; all the things that are written bellow are subject to changes and some of them might be way too simplistic. Please take this with a grain of salt.
I will only speak about Shenandoah
, as I think I understand it; which is not a generational GC.
There are two phases here actually: Mark
and Compact
. I would strongly emphases here that both are concurrent and do happen while your application is running (with some very short STW events).
And now to the details. I have explained a bit of things here, but because that answer is related to somehow a different question; I'll explain more here. I assume that traversing the graph of live objects is no news for you, after all you are reading a book about GC
. As that answer explains, when the application is fully stopped (also called brought to safe-points), identifying live objects is easy. No one is changing anything under you feet, the floor is rigid and you control everything. Parallel collectors do this.
The really painful way is to do things concurrent. Shenandoah employs an algorithm called Snapshot at the beginning
(that book explains it AFAIK), will call it SATB
for short. Basically this algorithm is implemented like this: "I will start to scan concurrently the graph of objects (from GC roots), if anything changes while I scan, I will not alter the heap, but will record these changes and deal with them later".
The very first part that you need to question is : while I scan. How is that achieved? Well, before doing the concurrent mark
, there is a STW event
called Initial Mark
. One of the things that gets done in that phase is to set a flag that concurrent marking has started. Later, while executing code, that flag is checked (Shenandoah
thus employs changes in the interpreter). In pseudo-code:
if(!concurrentMarkingActive) {
// do whatever you were doing and alter the heap
} else {
// shenandoah magic
}
In machine code that might look like this:
test %r11, %r11 (test concurrentMarkingActive flag)
jne // concurrent marking is currently active
Now GC knows when concurrent marking is happening.
But how is concurrent marking even implemented. How can you scan the heap while the heap itself is mutated (not stable)? The floor under your feet adds more holes and removes them also.
That is the "shenandoah magic". Changes to the heap are "intercepted" and not persisted directly. So if GC performs a concurrent mark at this point in time, and application code tries to mutate the heap, those changes are recorded in each threads SATB queues
(snapshot at the beginning). When concurrent mark is over, those queues are drained (via a STW event
called Final Mark
) and those changes that were drained are analyzed again (remember under a STW event
now).
When this phase Final Mark is over GC knows what is alive and thus what is implicitly garbage.
Compact phase is next. Shenandoah
is now supposed to move live objects to different regions (in a compact fashion) and mark the current region as one where we can allocate again. Of course, in a simple STW phase
, this would be easy : move the object, update references pointing to it. Done. When you have to do it concurrently...
You can't take the object and simply move it to a different region and then update your references one by one. Think about it, let's suppose this is the first state we have:
refA, refB
|
---------
| i = 0 |
| j = 0 |
---------
There are two references to this instance: refA
and refB
. We create a copy of this object:
refA, refB
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
We created a copy, but have not updated any references yet. We now move a single reference to point to the copy:
refA refB
| |
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
And now the interesting part: ThreadA
does refA.i = 5
, while ThreadB
does refB.j = 6
so your state becomes:
refA refB
| |
--------- ---------
| i = 5 | | i = 0 |
| j = 0 | | j = 6 |
--------- ---------
How do you merge these objects now? I'll be honest - I have no idea if that would even be possible and neither is this a route that Shenandoah
took.
Instead, the solution from Shenandoah
does a very interesting thing IMHO. An extra pointer added to each instance, also called forwarding pointer:
refA, refB
|
fwdPointer1
|
---------
| i = 0 |
| j = 0 |
---------
refA
and refB
points to fwdPointer1
, while fwdPointer1
to the real Object. Let's create the copy now:
refA, refB
|
fwdPointer1 fwdPointer2
| |
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
And now, we want to switch all references (refA
and refB
) to point to the copy. If you look closely, this requires only a single pointer change - fwdPointer1
. Make fwdPointer1
point to fwdPointer2
and you are done. This means one single change as opposed to two (in this set-up) of refA
and refB
. The bigger win here is that you don't need to scan the heap and find out references that point to your instance.
Is there a way to atomically update a reference? Of course : AtomicReference
(at least in java). The idea here is almost the same, we atomically change the fwdPointer1
via a CAS
(compare and swap), as such:
refA, refB
|
fwdPointer1 ---- fwdPointer2
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
So, refA
and refB
point to fwdPointer1
, which now points to the copy we have created. Via a single CAS
operation, we have switched concurrently all references to the newly created copy.
But, we need to understand the drawbacks, there is no free lunch.
First, is obvious :
Shenandoah
adds a machine header that each instance in the heap (read further, as this is false; but makes understanding easier).Each of these copies will generate an extra object in the new region, thus at some point there will be at least two copies of the same object (extra space required for
Shenandoah
to function, as such).When
ThreadA
doesrefA.i = 5
(from the previous example), how does it know if it should try to create a copy, write to that copy andCAS
thatforwarding pointer
vs simply do a write to the object? Remember that this happens concurrently. Same solution as withconcurrentMarkingActive
flag. There is a flagisEvacuationToADifferentRegionActive
(not the actual name). If that flag istrue
=> Shenandoah Magic, else simply do the write as it.
If you really understood this last point, your natural question should be :
"WAIT A SECOND! Does this mean that Shenandoah does an
if/else
againstisEvacuationToADifferentRegionActive
for EACH AND SINGLE write to an instance - be that primitive or reference? Also does that mean that EACH read must be accessed via theforwarding pointer
?"
The answer used to be YES; but things have changed: via this issue (though I make it sound a lot worse than it really is). Now they use Load
barriers for the entire Object, more details here. Instead of having a barrier on each write (that if/else
against the flag) and a dereference via the forwarding pointer
for each read, they moved to a load barrier
. Basically do that if/else
only when you load the object. Since writing to it implies reading first, they thus preserve "to-space invariant". Apparently this is simpler, better and easier to optimize. Hooray!
Remember that forwarding pointer
? Well, it does not exist anymore. I don't understand the details in its entire glory (yet), but it has to do something with the possibility to use the mark word
and the from space
that, since the addition of load barriers, is not used anymore. A lot more details here. Once I understand how that really works internally, will update the post.
G1
is not VERY much different than what Shenandoah
is, but the devil is in the details. For example Compact
phase in G1
is a STW
event, always. G1
is always generational - even if you want that or not (Shenandoah
can be sort of like that - there is a setting to control this), etc.
来源:https://stackoverflow.com/questions/59044616/whats-the-mark-compact-algorithm-used-by-hotspot