The Sun Hotspot guys have been working on a new garbage collector to replace CMS called G1. This presentation went over the differences between the old CMS and the new G1 collectors and also included some perspective from a guy at the Chicago Board of Options Exchange who has been beta testing it.
CMS divides the world into the young and old generations. This is done to take advantage of the observation that the lifetime of objects is highly uneven – the vast majority of objects die young glorious deaths and a very small number of objects live for a very long time (effectively the life of the app). Also important is that there tend to be very few references from the old generation to the young generation. Because of this, it’s ok to focus our collection attention on the young gen.
In CMS, new objects are created in the young generation which is further broken up into eden and two survivor spaces. Young gen GC checks to find live objects and those are put either in a survivor space or in the old generation, depending on age. Old gen gc is mostly concurrent but does stop-the-world pauses to finish up. Also stop-the-world for reference marking. Old gen gc is fragmented and sweep finds holes and manages in free lists. There is a fallback to full stop-the-world collection and compaction.
G1 (“garbage first”) takes a different approach – all memory (except perm gen) is broken into 1 MB “regions”. Young and old are both comprised of some set of non-contiguous regions but these change over time. During young gc survivors of a region are either copied to a new young gen region or to an old gen region as appropriate.
In G1, the old generation GC there is one stop-the-world pause to mark. If any region is found to contain no live objects, the region is immediately reclaimed (this happens more frequently than you’d expect due to locality). Then compact old regions into new old region. Old gen collections are piggybacked on young gen collections.
The technique for how G1 manages references into a region is called “remembered sets”. Every region has a small data structure (<5% of total heap) that reduces work needed to do marking. The remembered sets contain all external references into that region (references within the region are not included). After this initial layour by Tony Printezis (who was entertaining and explained things well), Paul Ciciora talked about how they test things at CBOE. Probably most important Paul said it is still a work in progress and not production-ready yet. One interesting item from the Q&A was that this will definitely be in Java SE 7 (probably committed in next few weeks) and that it will also be released in Java 6 update as well.