3. Algorithm - Marking out Root Objects
To be an accurate garbage collector, the system must know all effective references to every managed object while considering the graph of reference relationship. Effective references to a managed object may come from either inside or outside the managed heap (the logical collection of all managed objects). References inside managed heap, such as pointer fields of managed objects, can be easily described by a "traverse" method. To determine references outside the managed heap, especially the program's stack and processors' registers, is not so easy. Since the outside world changes rapidly and frequently, tracing collection must kick in only when the state is safe. A GC-safe state means the application is safe to allow garbage collection to start. An application thread in GC-safe state always makes all its references known to the collector. Following are some common methods to make the collector aware of these references.
Thread Hijacking - The system modifies the return address of a function in thread stack and makes it point to a special address. When the current function returns, a special procedure executes and suspends the thread for garbage collection.
Inserting Safe Points - The compiler inserts calls to a special function that checks whether a garbage collection is pending. If so, the thread is suspended and waits for collection. Inserting safe points in long loops ensures any thread can be advanced to a safe point in bounded time.
Layout Tables - The compiler emits description tables for the garbage collector to determine all references at every point in a function. The table normally includes descriptions of CPU registers usage, as in debug information. This information makes the collector aware of all references at any place, as in a debugger. Therefore, application threads is fully interruptible. However, this information is bloated and impedes the performance due to a large footprint.
There are some other approaches, but most of these methods require compiler supports in forms of providing extra codes or data structures for the garbage collector. Not every compiler provides these supports, and among those supports, there are so many different vendors and so many different implementations, lacking a standard to regulate their behaviors. Although a conservative garbage collection can work in an uncooperative environment, but it may leave unpredictable amount of garbage un-reclaimed, and much worse, may erroneously reclaim live objects at some particular scenarios.
3.1 Basic Idea
We choose to avoid the root-set scanning completely. Instead scanning for references in the root-set, including application threads' stacks and contexts, we dynamically mark out managed objects that are referenced from root-set; when at tracing collection, the system identifies these marked-out objects and begins traversal from them as root objects to determine unreachable garbage. The collector only needs to consider the inside of manage heap, unloaded the burden of analyzing the complicated data structures outside. There is no requirement for compiler to generate particular data or codes, and as a result, it is widely portable to various C++ compilers and other languages. The collector needs not to trace the rapid changes occurred outside the managed heap, and optimizations outside the managed heap can be freely applied. For example, optimization of moving a reference from a local pointer variable in program's stack to a CPU register does not change the state of managed heap, and does not affect the consistency of reference relationship from the point of view of collector. For the same reason, references from outside of managed heap can be stored in union, bit-fields, pointer offset, or other hidden structures.
Before proceeding detailed description of our approach, we first explain some terms and concepts.
First, we define the term "extended root-set" as all area except managed heap. Managed heap is referred to as a collection of managed objects, not necessarily being in a particular contiguous area of memory. For example, people can directly use the traditional memory manager to allocate space for managed object, such as "malloc" in C or "new operator" in C++. In this case, managed objects are mixed in place with unmanaged native objects. Managed object are those whose lifetimes are managed by the garbage collection system. Garbage collection system allocates memory space for managed object and an associated control block as HEADER showed in Figure 1, and automatically releases the memory when the object is no longer being used. For example, an application program can invoke a "gcnew" operation, or the like, to request the garbage collection system to create a managed object of specified class. The system will return an address or a handle of the created object. The object belongs to managed heap no matter where it actually resides.
The "extended root-set" comprises memory areas such as application thread stacks, static data sections and native heaps. From a C/C++ programmer's perspective, "extended root-set" includes global, local, static, automatic variables, function parameters and return value, and member variables of unmanaged objects in the above area and native heap. In another words, it contains all variables except member variables of managed objects.
An alternative definition of "extended root-set" is that, except the effective references that will be traced by garbage collector, all other effective references are said to be within the "extended root-set". Effective references are references that will keep the referent alive and from reclamation. This definition is more accurate especially in a system that allows tracing into native objects, in another word objects are exposed to tracing collector but not under management of garbage collection system. For example, some objects in the native heap or static storage section can be exposed to garbage collection system, so their referents and descendants can be traced and handled properly.
A reference from the extended root-set will cause the referent to be marked out as active. For example, an automatic pointer variable in a C/C++ function will cause the referent to be marked as active. The system should monitor references from the extended root-set, maintain a consistency or a conservative consistency. That is, if there is an effective reference from extended root-set, the referent should be marked out as an active object; if there are no more references from the extended root-set, the referent should be put back to inactive state. A conservative consistency is referred to as, from the view of garbage collection, allowing some garbage objects to be "leaked" from the current garbage collection and be collected eventually later. It allows some delays between object state and the state of references from the extended root-set in order to improve performance. For example, when an object has no references from the extended root-set, it can remain in the active state for a while.
A straight way of marking out active objects is using a reference counting technique. The system maintains a "lockcounter" for each managed object, where the value of the counter represents the number of references from the extended root-set, and indicates whether the object is marked or not as a flag. If a reference from the extended root-set is created, the value of the lock-counter is incremented; If a reference from the extended root-set goes out of scope, the value of the lock-counter is decremented; a positive value of lock-counter means the object is active, or referred to as locked state; if the value of the lock-counter is zero, then the object is not in a locked state.
3.2 Detailed Approach
In our implementation, we provide a set of smart pointer template classes in C++. Each type of smart pointer has its own specific scene to apply for, in relation to the purpose of pointer usage and the location of pointer variable. In another words, there may be several different types of smart pointers pointing to the same type of objects.
One of these template classes is "CLockedPtr<T>", which always keeps the referent object active (locked), and prevents it from reclamation. CLockedPtr performs automatic reference counting on the lock-counter of the referent. Every CLockedPtr pointer is a root holder of the referent object, contributing to the value of lock-counter of the referent. The system always treats an object with non-zero lock-counter as active (locked) and alive, and never reclaims it either by reference counting module or tracing collector module.
Tracing garbage collection can be based on the above marked out active objects. Every managed object associates with a traverse routine, which is provided by application program at the moment that the object is created. Traverse routine will inform the garbage collection system of which objects are referenced by the specific object. It can describe the reference relationship of the object statically or dynamically. Normally, each class should have its own traverse routine defined. Some C/C++ macros are provided to help manually define the routine, and more intelligent compiler tools can automatically generate the routine for most general cases. Here is an example.
class MyClass { ... };
GC_TRAVERSE (MyClass)
{ // define traverse routine of MyClass
// inform system the descendant object from member variable m_pNext
GC_TRACE_PTR(m_pNext);
// inform system other descendants
... ...
}
Traverse routine only need to inform the system the address of the referent object, not the address of the member pointer. Therefore, not only can it obtain the address directly from member pointer, but also it can deduce the address from one or more member variables. Moreover, the traverse routine can call other functions to get the address of the referent object. The collector need not to know the actual internal structure of the traversing object, the object can have any complicated structures, including bit field, union, hidden references.
Native data structures, also, can be described by the traverse routine and traced by garbage collector. Native data structures herein are those defined and created by any native language, such as former C++ or even assembly language without GC support. For example, a well-defined opaque native binary component can be traced by our garbage collector, and the managed objects it referenced are kept alive. In fact, the difference between managed object and native object in our approach is not as much as in some virtual machines. As well as the collector can trace into native objects, an instance of managed class can be created in a native memory block and under the manual management of application program. For example, an instance of managed class can be put into the static storage area as a permanent object.
We give a stop-the-world collector here in this section, and a concurrent pause-less collector in section 5 later.
Figure 2 is a flowchart of stop-the-world tracing collector. First it suspends all application threads; then it switches all objects to color white; and then it scans all objects, determines root objects by their positive value of lock-counter, marks them as black; after or at the same time, it starts traversal from these root objects, marks reachable objects as black; finally, when all root objects and its descendants have been traced and marked black, it resumes application threads, and reclaims those white objects.
The system can immediately suspend all application threads and start tracing garbage collection. There is no need to wait for them to enter GC-Safe state, because the application is always in GC-Safe state. The reference counters' increment and decrement operations are atomic, thus at any time an object has only two possible state, with a non-zero lock-counter or with a zero one. With correct increment and decrement logical operation ordering (i.e. create new reference before drop the old one, more details are discussed in following sections), an object is always in correct state. There is no transition inconsistency, and the garbage collector can always obtain a correct and consistent state of the reference relationship. Therefore, all application codes are always in GC-Safe state, and garbage collection can start at any place.
Because there is no root-set scanning, the root-set can keep changing while concurrent garbage collection is in process. This will be explained in Section 5 addressing concurrent garbage collection.
This "stop-the-world" collector can perform accurate garbage collection in an uncooperative language environment. In the following sections we will address the questions how we remove a significant amount of RC operations and costs, provide deterministic reclamation, build a concurrent pause-less collector and other issues.