I understand that time taken by YGC is proportional to number of live objects in Eden. I also understand that how the live objects are figured out in Major collections (All the
how the live objects are figured out in young generation collection ?
A good high-level description of how generational collection is implemented in HotSpot may be found in this article.
In general a generational collector marks the young generation as follows (assuming we just have two generations):
In HotSpot, old generation objects that contain young generation references are identified using the "Card Table". The old generation is divided into regions of 512 bytes, and each region has a "Card". If the region contains any old -> new generation pointers, a bit in the Card is set. Objects in regions with the Card bit set are then traced during a new generation collection.
The tricky thing is maintaining the Card table as new space references are written to old generation objects. In HotSpot, this is implemented using a software write-barrier that sets the appropriate Card's dirty bit whenever a new space reference is written into the memory region corresponding to the Card. As the linked article notes, this makes setting a reference field in an object more expensive, but it is apparently worth it due to the time saved by being able to collect only the new generation most of the time.
To trace just the youngest generation, the garbage collector scans the same root set (stacks and registers) and ALSO all the older (non-collected) generations that have been modified since the previous young generation scan. Only those objects that have been modified can possibly point at young generation objects, as unmodified objects cannot possibly point at objects that were created after their last modification.
So the tricky part is, how does the GC know which objects have been modified since the last GC? There are a number of techniques that can be used, but they basically come down to keeping track of writes to old generation objects. This can be done by trapping writes (write barriers) or just keeping track of all write targets (write buffers, card marking), all of which add overhead to the execution of the program while the GC is not running (so it doesn't show up as GC pause time, but does show up in the total elapsed time). Hardware support helps a lot, if its available. The tracking need not be exact, as long as every modified older generation object is scanned (scanning unmodified objects is a waste of time, but won't hurt anything).
I am quoting the relevant text from the article by Brian Goetz here.
Tracing garbage collectors, such as copying, mark-sweep, and mark-compact, all start scanning from the root set, traversing references between objects, until all live objects have been visited. A generational tracing collector starts from the root set, but does not traverse references that lead to objects in the older generation, which reduces the size of the object graph to be traced.