6. Other Aspects
6.1 Compiler Optimization and Memory Access Ordering
Obviously, the ordering of RC increment and decrement can not be reversed. Suppose we want to move a reference from pointer PB to pointer PA as pseudo C code showed in Figure 6, compiler may optimize to place the store instructions out of order. What will happen if GC starts as showed in the figure? If both PA, PB are of CLockedPtr type, the moving operation does not incur lock-counter changes, register eax can be ignored without problem. In other cases, these operations will cause reference counter changes and implementation should ensure strict ordering. (Fortunately, RC increment/ decrement operations ensure memory ordering on most modern platforms)
Also, the example in Section 5 using a global flag F1 to indicate the mark phase in process. When a mutator creates a new reference, it detects the flag and records down the referent object. Since the collector sets the F1 under the protection of locking instructions that force stronger ordering on most processors, we only need to guarantee that the write barrier is occurred after RC increment operation and before the last reference known to the collector dropped. In fact, we just perform write barrier before the RC increment function returns.
6.2 Garbage Reclamation Ordering
Without a specific finalization ordering, such as in Java and C#, a finalization method should not access any other managed objects, even with holding effective references to them. An effective reference cannot guarantee the referent alive when the finalization method is called, because they may be all in an unreachable object set, and the required object may have been reclaimed and may no longer be functional. This limitation lead to the finalization not much useful in practice.
Herein we present a solution to reclaim garbage in a controllable ordering. When unreachable objects are determined either by tracing collector or by mutators that detect reference counts dropped to zero, our mission is to reclaim them, including destructing objects (calling objects' destructors) and releasing underlying memories. We do not guess the order to reclaim objects, we let the application to help us to determine the order.
Unlike an object in a working state that it should only reference to functional managed objects, we allow an object in destructing state referencing to destructed non-functional objects, but operations on these references are limited to just releasing the references. Accessing a destructed nonfunctional object would lead to unpredictable results, but the system control block of the object can be still valid and can support operations like decrementing the extra-counter of the object. So, we postpone the releasing of the underlying memory after executing the destructor of an object, the underlying memory is freed only when all references to it are dropped.
In our approach, not every unreachable objects can be destructed. An object is destructible only when the object is unreachable and there is no other objects depend on it. Once an object is determined being unreachable, we give a last chance to the object to declare what it wants and what it does not. An object can use this chance to declare which objects it want to be kept alive for executing the destructor, namely destruction dependence. (At the same time, people can declare a request of custom destruction, such as to run destructor in a specific thread or in a windows message loop. Resurrection can also be requested here as further discussed in next subsection). A zero-referenced object always means there is no other objects depending on it. So, zero-referenced object are always destructible and the underlying memory can be freed. Before an object's destructor is called, the object should remain fully functional, and after the destruction, no access is allowed to the destructed object.
The destruction dependence forms a graph of relationship very like the graph of reference relationship. But, if there is existing a correct destruction ordering, the destruction dependence graph must be a directed acyclic graph(DAG). Circular dependency is not allowed. If it does exist, it means erroneous design of application, consequently the system would just log a warning message and skip executing the destructors of those circular dependent garbage. By keep destructing zero-dependent garbage, all objects of the DAG can be destructed. If all unreachable objects are destructed, all references between them are lost and consequently all underlying memory have been freed as all objects are zeroreferenced.
In one implementation, we conduct reclamation of unreachable objects in two phases.
First, the preparation phase, we iterate all unreachable objects and call the "OnReclaim" method of each object. In the "OnReclaim" method, the application code informs the system which objects it needs to be kept alive to run the destructor. The system in turn maintains a destruction dependent count (DDC) of every object for the number of destruction dependences.
Second, the reclamation phase, the system finds out zerodependent (zero DDC) unreachable objects and calls destructors of them. The destructor of an object should clear all references and dependences to other objects. Thus, during this phase, more and more objects become zero-dependent or even zero-referenced. Memories of zero-referenced objects are freed, and zero-dependent objects are destructed. The system repeats until there is no more zero-dependent objects to be destructed. Memories of circular dependent objects and their descendants can be freed without executing their destructors.
By the mechanism described above, the system guarantees that the destructor of an object can access the fully functional referent, which is declared as destruction dependent object in the "OnReclaim" method call. In addition, dropping the last reference to the dependent object in destructor will nest the execution of destructor of the referent, the same as in a simple reference counting environment.
6.3 Resurrection
Although we think resurrection is not much useful since most prevasively used reference counting practices do not support it, we do provide resurrection functionality as follows.
Unlike Java and C# resurrect objects after finalization, we never re-use an object once it is destructed. But, as describe above, before the system calls the destructor, the system calls "OnReclaim" method. In the "OnReclaim" method, application can stop the proceeding of destruction, put a pointer to the object back to live world, then the object is reachable from the application code. This is a special case to "object unreachability" in our approach, the unreachable object can be reached by 'this' pointer, and the application can reach the object again from then on.
Once the object is resurrected, all garbage it references are resurrected also. So, resurrection is a very expensive operation and should only be used when it is truly needed. If you want to destruct an object by your own way, use "Custom Reclamation" instead. "Custom Reclamation" does not resurrect other referenced objects but keeps destruction dependent objects alive until the object is destructed. "Custom Reclamation" is much cheaper and objects can be reclaimed in the same tracing collection.
6.4 Object Model
Unlike garbage collector in Java and C#, one aspect of our approach is that we actually do not trace references to objects. Instead, we trace references to memory blocks that contain objects. A managed object is nothing but a normal object with some smart pointer fields and a traverse routine. There is no special data structure needed to inherit, such as a global base class "Object" in Java and C#. Managed object are exactly having the same structures as user defined. System control data, including lock-counter and extra-counter, are not associated with object but with the underlying memory block. Therefore, we allow virtually any C/C++ object aggregation, such as array of objects(not limited to primitive types), multiple-inheritance, and member object fields, etc. Giving an example in traditional C/C++, when a block of memory is allocated for an array of integer, the application program should remember, or be able to figure out, the beginning of the memory block. We are the same, our approach requires the application program to tell the system all addresses of managed memory blocks it uses. This design is more compatible with traditional C/C++ data structure, and is more efficient since it reduces the number of standalone memory blocks and has less indirect pointer accesses. Under this architecture, it is easy to support garbage collection for Microsoft COM without losing binary compatibility.
6.5 Interior Pointers
Normally, references to the interior of objects are ignored by the garbage collector. The application program can translate the interior reference back to the address of underlying memory, and tell the system in the traverse routine. There are many methods to determine memory block by arbitrary interior address. However, some of them require particular memory management, such as group objects by size in page, or using a Card. Also, the costs of these algorithms are still so high comparing to application level translation. We recommend to use application level translation. For example, people can use virtual table, such as COM interface to translate an interface address to the outermost complete object reference. Or, they could use CONTAINING RECORD macro to convert an interior address to object reference.
7. Performance Analysis
The cost of the system mainly comes from reference counting and tracing collection.
Write barrier is introduced to every reference counting maintenance. Fortunately, the intricate and costly part of write barrier only occurs when a concurrent tracing collector is in process, and tracing collection need not to be executed as frequently as prior designs since reference counting has already reclaimed many zero-referenced objects. Also, significant portions of reference counting operations containing write barrier are removed. Therefore, the cost of write barrier does not as high as people first thought.
The tracing collector scans all objects for marked-out root objects. This might cost more (or not) than scanning root-set, depending on the size of root-set and the number of managed objects. Because the tracing collection runs infrequently or runs when the system is idle, the efficiency of collector does not matter much. Further, because there is no code or data structures injected to describe root set references, and zeroreferenced objects are reclaimed earlier, the working set of the application is reduced considerably and in turn improve the performance of the whole system.
This system totally removes suspensions of application threads. The delay only comes from the contention for shared data structures among the collector and mutators. The worst case is predictable, ranging from tens to hundreds instruction cycles at about micro-second level on stock hardware.
The programming style keeps reminding programmer the destruction of variables and objects, it helps to eliminate inadvertent application logics that lead to memory leaks even in Java and C#. The correct way is removing effective references as soon as possible after use. With the evolution of application program, deeper level of optimization can be applied, making it closer to the high level of efficiency in manual management. Accessing native objects in this system are so straightforward and efficient, there is no need of proxy / thunk to access native objects. C++ features, such as union, template, bit fields, multiple inheritance, are directly supported. More resorts are provided for application program to achieve better performance.
8. Conclusion
This garbage collection can perform accurate collection in an uncooperative language environment, such as standard C++. It eliminates root-set scanning by marking out root objects dynamically. A full concurrent garbage collection can provide predictable micro-second latency of worst case on stock hardware. Improved non-deferred reference counting technique significantly reduces the cost of reference counting, and the system is efficient both in time and space. Deterministic reclamation is provided to be compliant with object oriented programming idiom - RAII (Resource Acquisition Is Initialization). Many features are supported in the system, such as union, bit-fields, garbage reclamation ordering, multiple-inheritance, resurrection, etc. It is a pure software based collector with little system service requirement, so it can be used in a wide array of platforms.
A future work direction is to investigate applying more optimizations to the system, such as to reduce the write barrier synchronization costs, reduce reference counting operations by static analysis tools, etc. Remote garbage collection of this algorithm and tools for rapid software development are also a direction in the future.