Home > Benchmarks > MicroBench

Cost of Various GC operations

In this section, we will test and measure the cost of various GC operations, including the cost of traversal objects in garbage collection, the cost of assigning a reference to a smart pointer, etc.

In most general cases, these costs could be ignored and programmers should focus on the core algorithm of their application, since that is probably the most effective way to improve the overall performance.

In other cases that these GC costs are so important for the application, programmers can get an estimation from the following testing data, and apply optimization on the critical portion of code to avoid or reduce these costs, such as using Sub Heap concept to reduce traversal cost, and using weak pointers to avoid pointer operations costs, etc.

Garbage Collection (Object Traversal Cost)

Garbage collection system does not know how many objects have become garbage unless the system performs a traversal of all live objects in the system. One traversal may find out lots of unused objects, but may gain nothing in the worst case (all are live and no object is eligible for reclamation). The number of live objects, not the number of garbage, are significantly affecting the overall cost of garbage collection.

In this testing, we created a tree of nodes and kept them alive while performing garbage collection. We changed the depth of the tree to simulate different number of live objects in the system to see how they affect the resulting time spended on garbage collection.

Here is the testing result of HnxGC comparing with Microsoft .NET CLR and BDW conservative GC. The X-axis is the depth of the tree (number of objects), and Y-axis is the time that explicit GC operations cost, displayed in logarithmic scale.


The fundamental cost can be considered as the cost that garbage collecting on zero or very small number of objects. Due to the fact that HnxGC does not need to suspend threads in any phase of garbage collection, the fundamental cost of HnxGC is much smaller that other two participant GC systems. Combined with HnxGC Sub Heap concept, this feature provides people wider fields of using GC. For example, people can now frequently invoke a garbage collection on a small group of objects within a specific Sub Heap. You can even invoke a collection for every object creation, there will be no much performance plunges. It is hard to imagine to do so in a traditional garbage collection system.

A younger generational collection, such as in .NET CLR, does run faster than a thorough collecting, but still too slow (40 times) while comparing with HnxGC collection, and may probably become worse when there are more threads need to be suspended. Here is some data about the fundamental GC cost of various mode of GC systems.

HnxGC                   -    2.141 us
HnxGC (NonRC SubHeap)   -    1.993 us
HnxGC (Object Pool)     -    2.285 us

.NET CLR (Gen-2)        -  164.659 us
.NET CLR (Gen-0)        -   80.720 us

BDW                     - 2177.000 us
See Also:
Download Output Logs of the GC Cost Testing

Pointer Operation Costs

We tested and measured the time of several pointer operations of various types of HnxGC pointers, also we choosed C++ native pointer as manual memory management, BOOST's shared pointer and intrusive pointer as a time base for reference reason.

Initialization Assignment Moving
Native Pointer 0.004 us 0.001 us 0.003 us
BOOST Shared Ptr 0.022 us 0.048 us 0.106 us
BOOST Intrusive Ptr 0.015 us 0.222 us 0.344 us
HnxGC gcptr<>::g 0.014 us 0.017 us 0.007 us
HnxGC gcptr<>::s 0.009 us 0.018 us 0.007 us
HnxGC gcptr<>::mm 0.010 us 0.021 us 0.042 us
HnxGC gcptr<>::weak 0.002 us 0.000 us 0.002 us

The above testing result comes from pointer operations in a multi-threading mode.

People can see that HnxGC non-member pointers support "reference moving" operation, which is about 2-3 times faster than the combination of 2 assignments. Other types of pointers, including native pointer, Boost RC pointers, and HnxGC member pointer, consume almost twice the time of assignment operation as they use 2 assignments to complete a "move" operation.

In this table, there are some initialization cost of smart pointers, from 0.009us to 0.022us. This is mainly due to that Visual C++ compiler cannot fully optimize the constructors of a smart pointer array. It generates codes that invoke a non-inline function. When people use non-array individual smart pointer, the compiler can optimize away all unnessary codes. In fact, the above HnxGC smart pointers only need to clear their contents in initialization, the cost should be roughly the same the amount as a gcptr<>::weak pointer.

Sub Heap

For pointer to an object in a Sub Heap, HnxGC smart pointers behave differently. It introduces more initialization cost on non-member pointer, but trade off the cost of RC in all assignment operations. See the following testing result.

(Sub Heap) Initialization Assignment Moving
HnxGC gcptr<>::g 0.097 us 0.005 us 0.019 us
HnxGC gcptr<>::s 0.013 us 0.005 us 0.018 us
HnxGC gcptr<>::mm 0.008 us 0.002 us 0.002 us

The speed of member pointer operations are improved, but still need to perform GC Barrier operation checking for every assignment operations in order to maintain a consistent state with the concurrently running collector to provide the pauseless feature. So, the assignment operation of member pointer for Sub Heap objects are still a little higher than gcptr<>::weak pointer.

For a global pointer (in root set) to Sub Heap objects, the initialization of this shared pointer needs to acquire an internal exclusive lock, so the cost is so high 0.097us comparing to an in-stack (thread local) pointer gcptr<>::s 0.013us.

Moving operation is completed as combination of an assignment and a clearance operation.

Note - The clearance operation is allowed to run in parallel with the assignment in the internal of a processor, so sometimes the cost of "moving" is the same as "assignment". (Our Global Memory Fence technique guarantees the correctness of this type of operation)

See the following two coding styles, the left one is about 4-5 times faster than the right one in "moving" reference within a pointer array in an object. It also shows that a properly using of "Pointer Arithmetic" may boost the performance significantly than a "safe" and "Non-Pointer-Arithmetic" world, such as Java and .NET

Auto Typed Ptr

In above testings, the HnxGC managed pointers are of explicitly specified type, such as gcptr<>::g or gcptr<>::mm. If you just use gcptr<> without particular type specifier, the system will automatically detect and choose a proper type. That will introduce extra costs at pointer initialization for type detection, and costs at later various pointer operations due to the runtime action of type identifying.

Here is the testing result.

(Auto Typed) Initialization Assignment Moving
As Global Ptr 0.015 us 0.020 us 0.016 us
As In-Stack Ptr 0.009 us 0.020 us 0.015 us
As Member Ptr 0.010 us 0.026 us 0.054 us

(Auto Typed + Sub Heap) Initialization Assignment Moving
As Global Ptr 0.215 us 0.007 us 0.025 us
As In-Stack Ptr 0.021 us 0.007 us 0.025 us
As Member Ptr 0.011 us 0.005 us 0.007 us

Object Creation

Creating an object in HnxGC is a combination of multiple actions, including allocating memory space, initializing internal data structure, such as reference counter, associating the allocated memory block with specified object class, synchronizing between mutator and collector, etc.

The speed may be affected by various factors, such as how soon the objects are released. For example, from the following testing data, people can be aware that adding extra `delete' operation is faster (0.481 us) than just 'new' an object (0.584 us) without `delete'.

In the following testing, we measured and compared the costs of creating 10000 small objects in various ways. Basically, the speed of HnxGC is around the same grade of standard C++ manual management; after optimization with Object Pool, the speed of HnxGC is boosted to the level of .NET CLR and BDW GC, even faster.

          HnxGC  : 0.495 us
 HnxGC (SubHeap) : 0.522 us
    HnxGC (Pool) :              0.035 us
  HnxGC (Native) : 0.620 us
           C/C++ : 0.584 us
C/C++ (w delete) : 0.481 us
          BDW GC :              0.054 us
          CLR GC :              0.053 us

With increasing size of objects, the factor from how soon the objects are collected is becoming more important as the memory footprint and cache missing are playing more important roles concerning the overall performance. Other testings will reflect the fact more accurately.

See Also:
Download source code of these testings
The hardware and software environment for the testings
Home | Download | Terms of Use | Privacy Statement | Contact Us
Copyright@ 2008 hnxgc.harnixworld.com, All rights reserved.