What is remembered set in G1 algorithms used for?

问题

I just read some blogs about G1 algorithm.

The usage of remembered-set is confused to me.

Here is what I think:

Since we can use DFS to walk through every reference from GC-Roots, why do we need remembered-set?

Cause all the blogs to say the reason why we use remembered-set is we don't need to check every region to see if there is an object that is referenced by GC-Roots

回答1:

You need to understand what Card Table is first, IMO. How do you scan only young generation region and clean it, if there are references from old generation back to young? You need to "track" exactly where these connections are present - so while scanning young generation you could clean it without breaking the heap.

Think about it: you can't mark for removal an Object A that it is in young generation now, if there is a reference B to it, from old generation. But remember that right now - you are in the young collection only. So to track these "connections" a Card Table is implemented. Each bit from this card table says that a certain portion of the old generation is "dirty", meaning also scan that portion from the old generation while scanning young.

Why do you need that? The entire point of scanning young is to scan a little piece of the heap, not all. This card table achieves that.

G1 has regions. What if you are scanning regionA and you see that it has pointers to some other regionB? Simply putting this information in the Card Table is not enough. Your card table will only know about regionA, and next time you scan regionB - how do you know you are supposed to scan regionA also? If you don't do that, obviously the heap integrity is broken.

As such : remembered sets. These sets are populated by an asynchronous thread: it scans the card table and according to that information it also scans where these "dirty" regions have pointers to. It keeps track of that regionA -> regionB connection. Each region has it's own remembered set.

So when you reach the point that GC needs to happen, when scanning regionB you also look at it's remembered set and find out that you also need to scan regionA.

In practice, this is why G1 became generational : these remembered sets turned out to be huge. If you divide the heap in young and old, there is no need to keep the connections between young generations, you scan them all at once anyway, thus taking away the burned on the size of these sets. G1 wants to keep that 200ms (default) promise - to do that, you need to scan young generation all at once (because there is no connection between regions in remembered sets and otherwise heap integrity is gone), but at the same time if you make young generation small - the size of remembered sets will be big.

As such, touching these settings is an engineering miracle, IMHO.

来源：https://stackoverflow.com/questions/61936621/what-is-remembered-set-in-g1-algorithms-used-for

标签

java

jvm

g1gc