I am running a Java program with the G1 garbage collector using the following options:
-XX:-UseBiasedLocking
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
A blog post explains every part of the GC log. Here is a quote:
0.522: [GC pause (young), 0.15877971 secs]
This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish.
Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions.
1.730: [GC pause (mixed), 0.32714353 secs]
Let's take a look at all the sub-tasks performed in this Evacuation Pause.
[Parallel Time: 157.1 ms]
Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms.
[GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1]
The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads.
I am in no way well versed on the Garbage-First Garbage Collector and this question inspired me to check it out for the first time. There is a chance I may be wrong with some of my information.
To get more information about the Garbage-First Garbage Collector (G1GC), the Garbage-First Garbage Collection paper (here, here, here and here) is an invaluable resource. You can find an introduction to the G1GC here. The HotSpot Glossary comes in handy to understand JVM terms. The following papers were also helpful in further understanding Garbage Collection:
With these resources and an OpenJDK 7 debug build, you can start to understand the G1GC log.
Using the papers and web pages referenced above, here are some useful definitions of terms that came up most often:
To understand the G1GC log better, I used the following OpenJDK 7 source files:
Below is an annotated copy of the G1GC log in question.
44900.297: [GC pause (young) (initial-mark), 0.08894851 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^ Collection pause at safe-point
^^^^^^^^ In full young GC mode
^^^^^^^^^^^^^^^ Last pause included initial mark
^^^^^^^^^^^^^^^^^^ Elapsed seconds in method
44900.386: [GC concurrent-mark-start]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^^^^^^^^^^^^^^^^^ Concurrent mark thread started
[Parallel Time: 83.7 ms]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Elapsed milliseconds for all GC worker threads to finish
[GC Worker Start Time (ms): 44900297.6 44900297.6 44900297.6 44900297.6 44900297.6 44900297.7 44900297.7 44900297.7 44900297.7 44900297.7 44900297.7 44900297.7 44900297.7
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed seconds from JVM start that GC worker threads were started
Avg: 44900297.7, Min: 44900297.6, Max: 44900297.7, Diff: 0.1]
^^^^^^^^^^^^^^^^^^^^^^ Average GC worker thread start time (elapsed seconds from JVM start)
^^^^^^^^^^^^^^^^^ Minimum GC worker thread start time (elapsed seconds from JVM start)
^^^^^^^^^^^^^^^^^ Maximum GC worker thread start time (elapsed seconds from JVM start)
^^^^^^^^^^^^^^ Total seconds to start all GC worker threads
[Update RS (ms): 23.5 24.3 25.0 25.0 23.9 24.4 25.2 24.1 25.7 24.7 24.8 24.4 24.7
^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to update cards in remembered sets during an evacuation pause
Avg: 24.6, Min: 23.5, Max: 25.7, Diff: 2.1]
^^^^^^^^^^^^^^^^^ Average GC worker thread milliseconds to update RS
^^^^^^^^^^^^ Minimum GC worker thread milliseconds to update RS
^^^^^^^^^^^^ Maximum GC worker thread milliseconds to update RS
^^^^^^^^^^^^^^ Minimum/maximum delta of GC worker thread milliseconds to update RS
[Processed Buffers : 16 19 19 23 20 24 18 18 18 17 20 16 19
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Total remembered set buffers processed by each GC worker thread
Sum: 247, Avg: 19, Min: 16, Max: 24, Diff: 8]
^^^^^^^^^^... Summary information for total remembered set buffers processed by all GC worker thread
[Ext Root Scanning (ms): 2.2 2.7 2.2 2.6 3.0 3.1 2.2 1.1 2.3 3.0 2.2 2.4 2.9
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to process heap roots during an evacuation pause
Avg: 2.4, Min: 1.1, Max: 3.1, Diff: 2.0]
^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to process heap roots during an evacuation pause
[Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan strong roots in the mark stack during an evacuation pause
Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
^^^^^^^... Summary information for total milliseconds for all GC worker thread to scan strong roots in the mark stack during an evacuation pause
[Scan RS (ms): 14.1 14.6 14.5 14.3 14.6 14.2 14.4 14.5 14.0 13.9 14.6 14.5 14.0
^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan for dirty cards in a heap region to update the remembered set
Avg: 14.3, Min: 13.9, Max: 14.6, Diff: 0.8]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to scan for dirty cards in a heap region to update the remembered set
[Object Copy (ms): 41.4 39.5 39.4 39.0 39.6 39.5 39.1 41.4 39.0 39.3 39.3 39.8 39.5
^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan root sets for pointers to include in this heap region's remembered set during an evacuation pause
Avg: 39.7, Min: 39.0, Max: 41.4, Diff: 2.4]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to scan root sets for pointers to include in this heap region's remembered set during an evacuation pause
[Termination (ms): 1.3 1.4 1.5 1.6 1.5 1.4 1.6 1.4 1.5 1.7 1.5 1.4 1.3
^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
Avg: 1.5, Min: 1.3, Max: 1.7, Diff: 0.4]
^^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
[Termination Attempts : 1185 1205 1219 1436 1171 1231 1471 1237 1461 1526 1353 1259 1170
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Total attempts for each GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
Sum: 16924, Avg: 1301, Min: 1170, Max: 1526, Diff: 356]
^^^^^^^^^^^^^^^... Summary information for total attempts for all GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
[GC Worker End Time (ms): 44900380.2 44900380.2 44900380.2 44900380.2 44900380.2 44900380.3 44900380.2 44900380.2 44900380.2 44900380.2 44900380.2 44900380.2 44900380.2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed seconds from JVM start that GC worker threads ended
Avg: 44900380.2, Min: 44900380.2, Max: 44900380.3, Diff: 0.1]
^^^^^^^^^^^^... Summary information for total elapsed seconds from JVM start that GC worker threads ended for all GC worker threads
[GC Worker Times (ms): 82.6 82.6 82.6 82.6 82.6 82.6 82.5 82.6 82.5 82.5 82.5 82.5 82.5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker threads
Avg: 82.6, Min: 82.5, Max: 82.6, Diff: 0.1]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker threads
[Other: 1.2 ms]
^^^^^^^^^^^^^^^^^^^^^^^ Other milliseconds during GC parallel processing; basically, total milliseconds minus: update RS, ext root scanning, mark stack scanning, scan RS, object copy, termination
[Clear CT: 0.5 ms]
^^^^^^^^^^^^^^^^^^^^^^^ Total milliseconds cleaning up the dirty card tables list
[Other: 4.8 ms]
^^^^^^^^^^^^^^^^^^^^ Total milliseconds in this GC pause not spent in parallel and clear CT
[Choose CSet: 0.0 ms]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Total milliseconds to choose young heap regions to add to the collection set
[ 1331M->830M(1840M)]
^^^^^ Heap size change for this garbage collector
^^^^^ Total heap size before this collection pause
^^^^ Total heap size after this collection pause
^^^^^ Total heap size capacity for this garbage collector
[Times: user=1.07 sys=0.01, real=0.09 secs]
^^^^^^^^ Execution seconds (uses TraceCPUTime class; constructor initializes times to zero, destructor prints times)
^^^^^^^^^^ User execution seconds for all threads
^^^^^^^^^ System execution seconds
^^^^^^^^^^^ Real (wall clock) execution seconds
44901.205: [GC concurrent-mark-end, 0.8186002 sec]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^^^^^^^^^^^^^^^ Concurrent mark thread ended
^^^^^^^^^^^^^ Total seconds executing concurrent mark thread
44901.205: [GC remark, 0.0258621 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^ Concurrent re-marking of all heap roots, final work
[Times: user=0.02 sys=0.00, real=0.03 secs]
^^^^^^^^ Execution seconds (uses TraceCPUTime class; constructor initializes times to zero, destructor prints times)
^^^^^^^^^^ User execution seconds for all threads
^^^^^^^^^ System execution seconds
^^^^^^^^^^^ Real (wall clock) execution seconds
44901.231: [GC concurrent-count-start]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^^^^^^^^^^^^^^^^^^ Start concurrent marking of live objects
44901.479: [GC concurrent-count-end, 0.2478477]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^^^^^^^^^^^^^^^^^^ End concurrent marking of live objects
^^^^^^^^^^^^ Total seconds to do concurrent marking of live objects
44901.479: [GC cleanup 940M->931M(1840M), 0.0073079 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
^^^^^^^^^^^ Concurrent marking cleanup (world is stopped at this checkpoint)
^... Heap size change for this garbage collector
^^^^ Total heap size before this operation
^^^^ Total heap size after this operation
^^^^^ Total heap size capacity for this garbage collector
^^^^^^^^^^^^^^ Total seconds executing concurrent marking cleanup
If you can build a debug copy of OpenJDK 7, you'll have a few more JVM options available to you that will give you even further information about G1GC. Use the following command to get a list of all JVM options:
java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -XX:+PrintFlagsWithComments -version
The output shows the following G1GC options available; this is not an all inclusive list, just some I thought were intersting.
intx G1CardCountCacheExpandThreshold = 16 Expand the card count cache if the number of collisions for a particular entry exceeds this value.
uintx G1ConcMarkForceOverflow = 0 The number of times we'll force an overflow during concurrent marking
double G1ConcMarkStepDurationMillis = 10.000000 {product} Target duration of individual concurrent marking steps in milliseconds.
intx G1ConcRSHotCardLimit = 4 The threshold that defines (>=) a hot card.
intx G1ConcRSLogCacheSize = 10 Log base 2 of the length of conc RS hot-card cache.
bool G1ConcRegionFreeingVerbose = false Enables verboseness during concurrent region freeing
intx G1ConfidencePercent = 50 {product} Confidence level for MMU/pause predictions
bool G1DeferredRSUpdate = true If true, use deferred RS updates
bool G1FixedEdenSize = false When set, G1 will not allocate unused survivor space regions
uintx G1FixedSurvivorSpaceSize = 0 If non-0 is the size of the G1 survivor space, otherwise SurvivorRatio is used to determine the size
bool G1FixedTenuringThreshold = false When set, G1 will not adjust the tenuring threshold
bool G1Gen = true If true, it will enable the generational G1
uintx G1HeapRegionSize = 0 {product} Size of the G1 regions.
intx G1InitYoungSurvRatio = 50 Expected Survival Rate for newly allocated bytes
intx G1MarkRegionStackSize = 1048576 {product} Size of the region stack for concurrent marking.
intx G1MarkingOverheadPercent = 0 Overhead of concurrent marking
intx G1MarkingVerboseLevel = 0 Level (0-4) of verboseness of the marking code
intx G1MaxHotCardCountSizePercent = 25 The maximum size of the hot card count cache as a percentage of the number of cards for the maximum heap.
intx G1MaxVerifyFailures = -1 The maximum number of verification failrues to print. -1 means print all.
intx G1PausesBtwnConcMark = -1 If positive, fixed number of pauses between conc markings
intx G1PolicyVerbose = 0 The verbosity level on G1 policy decisions
bool G1PrintCTFilterStats = false If true, print stats on RS filtering effectiveness
bool G1PrintHeapRegions = false {diagnostic} If set G1 will print information on which regions are being allocated and which are reclaimed.
bool G1PrintOopAppls = false When true, print applications of closures to external locs.
bool G1PrintParCleanupStats = false When true, print extra stats about parallel cleanup.
bool G1PrintReachableAtInitialMark = false Reachable object dump at the initial mark pause
ccstr G1PrintReachableBaseFile = The base file name for the reachable object dumps
bool G1PrintRegionLivenessInfo = false {product} Prints the liveness information for all regions in the heap at the end of a marking cycle.
bool G1RSCountHisto = false If true, print a histogram of RS occupancies after each pause
bool G1RSLogCheckCardTable = false If true, verify that no dirty cards remain after RS log processing.
bool G1RSScrubVerbose = false When true, do RS scrubbing with verbose output.
intx G1RSetRegionEntries = 0 {product} Max number of regions for which we keep bitmaps.Will be set ergonomically by default
intx G1RSetRegionEntriesBase = 256 Max number of regions in a fine-grain table per MB.
uintx G1RSetScanBlockSize = 64 {product} Size of a work unit of cards claimed by a worker threadduring RSet scanning.
intx G1RSetSparseRegionEntries = 0 {product} Max number of entries per region in a sparse table.Will be set ergonomically by default.
intx G1RSetSparseRegionEntriesBase = 4 Max number of entries per region in a sparse table per MB.
intx G1RSetUpdatingPauseTimePercent = 10 {product} A target percentage of time that is allowed to be spend on process RS update buffers during the collection pause.
bool G1RecordHRRSEvents = false When true, record recent calls to rem set operations.
bool G1RecordHRRSOops = false When true, record recent calls to rem set operations.
intx G1RefProcDrainInterval = 10 {product} The number of discovered reference objects to process before draining concurrent marking work queues.
intx G1ReservePercent = 10 {product} It determines the minimum reserve we should have in the heap to minimize the probability of promotion failure.
uintx G1SATBBufferEnqueueingThresholdPercent = 60 {product} Before enqueueing them, each mutator thread tries to do some filtering on the SATB buffers it generates. If post-filtering the percentage of retained entries is over this threshold the buffer will be enqueued for processing. A value of 0 specifies that mutator threads should not do such filtering.
intx G1SATBBufferSize = 1024 {product} Number of entries in an SATB log buffer.
bool G1SATBPrintStubs = false If true, print generated stubs for the SATB barrier
intx G1SATBProcessCompletedThreshold = 20 Number of completed buffers that triggers log processing.
bool G1ScrubRemSets = true When true, do RS scrubbing after cleanup.
uintx G1SecondaryFreeListAppendLength = 5 The number of regions we will add to the secondary free list at every append operation
bool G1StressConcRegionFreeing = false It stresses the concurrent region freeing operation
uintx G1StressConcRegionFreeingDelayMillis = 0 Artificial delay during concurrent region freeing
bool G1SummarizeConcMark = false {diagnostic} Summarize concurrent mark info
bool G1SummarizeRSetStats = false {diagnostic} Summarize remembered set processing info
intx G1SummarizeRSetStatsPeriod = 0 {diagnostic} The period (in number of GCs) at which we will generate update buffer processing info (0 means do not periodically generate this info); it also requires -XX:+G1SummarizeRSetStats
bool G1TraceConcRefinement = false {diagnostic} Trace G1 concurrent refinement
bool G1TraceMarkStackOverflow = false If true, extra debugging code for CM restart for ovflw.
intx G1UpdateBufferSize = 256 {product} Size of an update buffer
bool G1VerifyDuringGCPrintReachable = false If conc mark verification fails, dump reachable objects
intx G1YoungSurvRateNumRegionsSummary = 0 the number of regions for which we'll print a surv rate summary.
bool G1YoungSurvRateVerbose = false print out the survival rate of young regions according to age.
To see what kind of G1GC information is provided using the OpenJDK 7 debug build, I used the following test program:
import java.lang.ref.WeakReference;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
public class G1GCTest implements Runnable {
private int iterations = 0;
private final Random rnd = new Random();
private final List<Object> young = new ArrayList<Object>(1000);
private final List<WeakReference<Object>> old =
new ArrayList<WeakReference<Object>>(100000);
private void clearOld() {
int clearedCnt = 0;
for (final Iterator<WeakReference<Object>> iter = old.iterator(); iter
.hasNext();)
{
final WeakReference<Object> ref = iter.next();
if (null == ref.get()) {
iter.remove();
clearedCnt++;
}
}
if (0 < clearedCnt) {
System.out.println("Cleared " + clearedCnt
+ " weak references to old objects.");
}
}
public void run() {
if (0 == ++iterations % 10000) {
System.out.println("iterations=" + iterations + ", young.size()="
+ young.size() + ", old.size()=" + old.size());
clearOld();
try {
Thread.sleep(100);
} catch (final Throwable e) {
// Do nothing!
}
}
if (rnd.nextBoolean()) {
young.add(new byte[1000]);
}
if (rnd.nextBoolean() && !young.isEmpty()) {
final int nextInt = Math.abs(rnd.nextInt());
final int idx = nextInt % young.size();
final Object obj = young.remove(idx);
old.add(new WeakReference<Object>(obj));
}
}
public static void main(final String[] args) {
final G1GCTest t = new G1GCTest();
while (1000000 > t.iterations) {
t.run();
}
}
}
And executed it using:
java -Xms8m -Xmx8m -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1YoungSurvRateVerbose -XX:G1PolicyVerbose=2 -verbose:gc G1GCTest
The output of which was very verbose, but provides a slew of information that could prove useful if you're doing some GC tuning.
I have a tool (pre-release, not OSS yet) which may be able to shed some light here.
If you're interested, please email me and we can talk about how to get the logs to me.
To the comments on the post - I've used G1 in prod-like loads and seen some good numbers (not as good as the ones you're seeing, but better than CMS for some loads).
Haven't seen any crashes from Java 7 betas (but did see some with Java 6 builds).