4. Improved Reference Counting
The system uses reference counting to provide deterministic reclamation. Aside from the lock-counter described in Section 3, each managed object has another reference counter termed the "extra-counter". It reflects the number of references to it from within managed heap. These two counters forms the whole number of effective references in system to the object. When both the lock-counter and extra-counter become zero, then the object can be reclaimed as ordinary reference counting does. Another smart pointer class "CMemberPtr" is introduced to be a member pointer type of managed object. CMemberPtr contributes to the value of extracounter of the referent, behaving very similar to CLockedPtr.
When an object loses all references to it, namely a zeroreferenced object, it should be reclaimed by the system. Most importantly, the destructor routine of the object should be called so that system resources can be released timely. Since a zero-referenced object may be detected as unreachable garbage, and an unreachable garbage can become zeroreferenced due to the reclamation of its parental objects, special cares should be taken to synchronize the behavior of reference counting and tracing collector. Unlike Java and C#, which finalize garbage in an undetermined order, our approach provides an option to execute destructors of garbage objects in a order that is controlled by application programs. In Section 6, we will address reclamation ordering in detail.
Whereas the system uses a non-deferred reference counting, it successfully removes significant amounts of reference counting maintenance and costs. We noticed the fact that a large portion of reference counting maintenance occur in the program's stacks, including automatic pointer variables, function's parameters and return value. If we do not change the state of lock-counter of managed object, such as remaining a non-zero value, we can change some references in extended root set and remove some reference counting maintenance operations without fear of interfering tracing collector. For example, multiple references from the same thread stack can be merged into one effective reference.
4.1 Using Weak Reference
We introduces weak pointer "CWeakPtr" smart pointer representing raw or dirty access to managed object. Using CWeakPtr does not induce any reference counting operations, just duplicate the address word of managed object. CWeakPtr is invisible to garbage collector. It is not an effective reference and cannot keep referent object alive from reclamation. The rule of using CWeakPtr is simple: only use CWeakPtr to access objects when there is a guarantee that one or more effective references are holding the object alive from reclamation. That is, don't lose all effective references to an object too soon, and remember to keep at least one effective reference to it when using CWeakPtr to access the object. For example, in Figure 1, object A has an effective reference to object B which is supposed always keep a reference to inner object C, then the application can use a dirty pointer, illustrated in dash line, to access object C. On the contrary, reference from B to E is dangerous, because there is no effective reference to keep object E and object F alive. These two objects are circular referenced garbage and may be reclaimed by the time of the next tracing garbage collection. CWeakPtr is strongly recommended to be used when there is one or more effective references holding the referent object alive. Also, CWeakPtr can be used to eliminate circular referenced path by replacing backward effective pointers, like pointers from inner object to outer object, with weak pointers.
Below, we try to eliminate the reference counting cost of passing references between function calls.
First step, in C/C++, when passing an object reference to a function as input parameter, the caller has guaranteed the referent object is alive at the entrances; during the execution of the callee, the caller loses the control until the function returns from callee. We can use CWeakPtr as input type of parameter of reference to a managed object. When the caller passes a CWeakPtr as parameter, it must have guaranteed that the referent object has one or more effective references holding it as the basic rule of CWeakPtr. In single thread environment, the target object is not released during the function execution except the callee releases it (very rare and can be handled by output parameters as described below), since caller loses the control. In a multi-threading environment, the target object might be released by other threads. However, since the other threads might release the object also before control transferred to callee, the caller normally has other mechanisms to guarantee the object is not released at the entrance, such as holding a lock for the object, or just holding a CLockedPtr automatic variable to keep the object alive. Therefore, we can setup a convention that when passing a reference to a managed object to callee as input parameter, the caller should guarantee the object is alive during the execution of the function. Under such convention, we can use CWeakPtr as input parameter of reference. The sub function can use the CWeakPtr parameter as effective reference to object, which is guaranteed by upper-level caller. For instance, the callee can directly pass a CWeakPtr parameter to further lower sub function as input parameter.
By using CWeakPtr as input type of parameter, passing a reference to sub function does not incur reference count maintenance.
4.2 Returning Reference from sub function
When people want the caller of function accept one or more effective references from the function, it can use CLockedPtr as type of output parameters or return type of the function. Function code can assign references to these CLockedPtr variables to keep the objects alive after return, since the final CLockedPtr variables reside in caller's stack frame actually. There might be more than one hidden assignments or transfers from callee to caller, some of which are depend on specific optimization of compiler. This time, these copying or duplication of references are effective reference operations, reference counting maintenances are required. But, we notice that after assigning an effective reference to CLockedPtr variable, the original variable are destroyed with execution goes out of scope. More than one CLockedPtr references in one thread stack referring the same object mostly are just wasting resources, we can remove it by converting "copying" to "moving".
We redefine "CLockedPtr" smart pointer's copy constructor to perform "moving reference" instead "cloning reference" when assigning a CLockedPtr pointer to another CLockedPtr pointer. That is, by default, after such an assignment, the left-side pointer of assignment operator gets the reference and the right-side pointer loses the reference. The "moving reference" action does not change the number of references to the object. There is no reference counting maintenance of the lock-counter. Also, as our tracing collector only care about the state of lock-counter, the "moving reference" action does not change the reference relationship for garbage collector, thus there is no write barrier or other synchronization needed for this "moving" action.
If people want to use a traditional copy assignment, he/she can use a special method "duplicate" of CLockedPtr to return a cloned reference to the object. The "duplicate" operation does create a new reference to the object and cause an increment of the lock-counter. Normally, this method is seldom used since it is rare to have multiple references to the same object from the same thread.
C++ compiler sometimes will automatically generate temporary objects, conduct assignment operations between them, such as implicit conversion of parameters and return value between function caller and callee. Because the default action of the CLockedPtr becomes a "moving reference" when the type of R-value operand is CLockedPtr, any number temporary CLockedPtr object can be inserted between the original pointer and ultimate pointer with no RC cost incurred. For example, we can create a managed object in a very deep sub function routine, and return it to the outermost function. Unlike simple RC costs O(N) increment and decrement operations (N related to the depth of nested function call), there is always one reference from the thread's stack and no lock-counter maintenance operations.
Because our approach changes the value of R-value pointer in an assignment operation, if the R-value expression cannot be dereferenced to certain address and modifiable, some compiler would automatically construct a temporary CLockedPtr instance based on an exported raw reference from the R-value expression, and then use the temporary pointer as R-value of assignment. This does not cause any problem, just introduce one pair increment/decrement operations of lock-counter.
Figure 3 illustrates returning a reference to an object from a function. The initial state is shown in the left part of the figure. In the thread execution stack, the CLockedPtr smart pointer A references an object with one attribute to the lock-counter; CLockedPtr pointer B is in the caller's stack frame, and it can be a local variable, or a temporary object. After returning from the callee, the reference to the object is moved from smart pointer A to B, the value of lock-counter of the object does not change, and no reference counter maintenance cost is incurred.
Therefore, passing a reference to/from a function would not cause reference counting maintenance. More optimizations can be conducted by static procedure analysis, or based on specific application. The rule is, a new effective reference is created only when it is really needed. Use weak pointers instead multiple effective references when possible.