Home > Benchmarks > GCBench

GCBench testing

GCBench is an artificial garbage collector benchmark that was originally written by John Ellis and Pete Kovac of Post Communications, and was then heavily modified by Hans Boehm, then at SGI. You can get more information from http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html. We modified it further to support HnxGC in Win32, and make it possible to use a single source code for .NET GC, BDW GC and HnxGC.

This testing program tests the performance of target memory management system by creating lots of garbage consisting binary trees of various depth. Here are some quotes from Hans Boehm.

      This is no substitute for real applications.  No actual application
      is likely to behave in exactly this way.  However, this benchmark was
      designed to be more representative of real applications than other
      Java GC benchmarks of which we are aware.

      It attempts to model those properties of allocation requests that
      are important to current GC techniques.

      It is designed to be used either to obtain a single overall performance
      number, or to give a more detailed estimate of how collector
      performance varies with object lifetimes.  It prints the time
      required to allocate and collect balanced binary trees of various
      sizes.  Smaller trees result in shorter object lifetimes.  Each cycle
      allocates roughly the same amount of memory.

      Two data structures are kept around during the entire process, so
      that the measured performance is representative of applications
      that maintain some live in-memory data.  One of these is a tree
      containing many pointers.  The other is a large array containing
      double precision floating point numbers.  Both should be of comparable
      size.

      ... ...

Basic Testing

In our testing environment, we got following testing results: (download and view the details of testing programs' output)

Data are displayed in seconds of execution time for single-threading and multiple-threading mode.
(ST / MT) C/C++ Manual
Management
HnxGC Microsoft
.NET Framework
Hans Boehm
Conservative GC
Boost
Intrusive Ptr
Boost
Shared Ptr
Standard 16.813 / - 12.953 / 18.656 1.594 / - 2.203 / - 19.078 / 30.140 45.235 / 46.328
Tuned up
(Maximum Optimized)
0.656 / 6.766 0.906 / 0.937 - - 1.906 / 19.468 20.500 / 26.172

Explanations

The "Standard" mode is based on Boehm's original GCBench, no fundamental changes.

The "Tuned Up (Maximum Optimized)" testing data were coming from programs that we had tried our best to gain the maximum performance. For traditional C/C++ manual management, we refined it to use BOOST's quick_allocator (see boost/detail/quick_allocator.hpp) instead standard new/delete operator in C/C++ runtime library; for HnxGC, we rewrote the program to use Object Pool mechanism; For .Net CLR and BDW, no measurements were applied; for BOOST's intrusive reference counting and shared pointer, we used BOOST's quick_allocator instead.

For multi-threading, all of BOOST's quick_allocator, intrusive and shared smart pointer solutions are required to recompile the source code with some different macro defined to instruct different behaviors, or rewrite some code; for HnxGC, the system will automatically detect more than one thread started, and switch to multi-threading core internally at the runtime, no need to recompile your code; for .NET CLR and BDW, there is also no need to rewrite the source either.

In the "Tuned Up" multi-threading mode, HnxGC Object Pool mechanism used a private pool for each thread's allocation to avoid the cost of locking such as in BOOST's quick_allocator. People may think a thread private allocator can also be applied to BOOST quick_allocator to avoid locking cost, but please notice that, if BOOST quick_allocator is located in a thread's private scope, it requires objects to be released (or deleted) just in the same thread. Therefore, other threads can access the objects, but cannot take the responsiblility to control the life time of these objects. That's a big difference from standard C/C++ manual memory management, and will limit the fields the technique applies or cause hard-to-find obscure bugs.

On the contrary, HnxGC's Object Pool allows objects created in one thread's private pool to become unreachable in other threads. All pools are protected and thread-safe in multi-threading environment, and only the allocations from an Object Pool are thread private. The thread creating an object is free to transfer the object reference to other threads via shared pointer, etc. The object can be used as it were created in normal thread-safe way, and it will be collected by the system after the last accessible path vanished no matter in whatever threads.

Analysis

To compare the maximum performance of these testing systems, we built the cost of time and performance chart based on the above data as follows:

From these graphs, people can easily notice that HnxGC is fastest in multi-threading mode among these testing system, especially it is nearly 2 times faster than the second one - the Microsoft .NET Common Language Runtime, which is already very fast (about 10 times faster) compared to traditional C++ new/delete runtime library. HnxGC overwhelms .NET CLR in this case, mostly due to the fact that HnxGC object allocations in Object Pool are linear and in-lined, and there is less costs of moving objects during the garbage collection.

For single threading mode, it looks like that the manual management C++ program using BOOST quick allocator is the fastest. However, people should be aware of that BOOST quick allocator (Boost QA) has some drawbacks while comparing with HnxGC. First, as we mentioned above, in multi-threading environment, Boost QA need synchronization locking and the cost will high; Second, the testing only reflect the objects of the same type or class, if the allocations are of various size and interlaced, linear allocation such as HnxGC and .NET CLR will perform better for less cache missing; Third, Boost QA does not release memory and always retain at the peak memory usage even all objects are released, while HnxGC Object Pool will release unused memory in BLOCK (about 256KB) immediately during the execution. Moreover, HnxGC provides APIs to pack and move out live objects from a pool, so that all the memory of the pool can be released. No more worries about several small objects causing the whole BLOCK retained.

Considering the testing results of HnxGC without Object Pool but reference counting, we can compare HnxGC to BOOST's reference counting smart pointer approaches. Reference counting (RC) has its own advantage on large object or object with associated scarce resources, since it can release object immediately. However, reference count maintenance is not trivial, especially in a multi-threading (MT) environment. We have done a lot of works on reducing the cost of reference count maintenance, that's why HnxGC has outperformed BOOST shared / intrusive smart pointers, especially in MT environment. See the following graph of HnxGC with these two BOOST RC approaches.

Larger Object

In the above basic and original testing, all nodes of various depths of trees are of very small size, 2 words (8 bytes in x86 32-bit system). In this case, we increased the node size to around 520 bytes (as Boehm did in his code), and reduced the total number of garbage to keep the program not use too much memory (around 10 MB of live objects). Here is the result:

Data are displayed in seconds of execution time for single-threading and multiple-threading mode.
(ST / MT) C/C++ Manual
Management
HnxGC Microsoft
.NET Framework
Hans Boehm
Conservative GC
Boost
Intrusive Ptr
Boost
Shared Ptr
Larger Object Size0.375 / 0.391 0.453 / 0.515 0.406 / - 0.672 / - 0.390 / 0.578 0.609 / 0.625
Larger Object Size
(Tuned)
0.172 / 0.219 0.266 / 0.297 - - 0.172 / 0.344 0.453 / 0.562

Basically, the costs come from the memory allocation and cache missing. As we can see there is no very significant difference among these systems in non-tuned mode (the first data row of table).

Using Object Pool for HnxGC, and BOOST quick_allocator for manual management and boost RCs, did improve the performance about 1/2 to 1/3, but not as much as it did in small object case.

Finalize

When we were profiling HnxGC, we found that the cost of executing destructors of objects is not trivial. Reference counting technique does cause almost every object has a destructor or similar behavior to do, like decreasing the reference counters of the objects it references. Generally, a non-RC automatic memory management needs not to do such work, this is one of the reasons that .NET outperforms RC approaches. We were so curious and did a testing trying to find out "How does .NET CLR perform if it was forced to do some destructing work."

We add a trivial action for every node to do in their destructor in HnxGC, and in finalizer routine in .NET CLR. As we can expected, the execution time of the new testing program does not increased much for HnxGC; but as for .NET CLR, it apparently losses its predominance situation and even slower than our RC approach. See the following table and graph for details.

Before Now
HnxGC 12.953 13.032
.NET CLR 1.594 19.313

Conclusions

In this GCBench case, the performance of HnxGC without optimization is at around about as standard C++ manual management, faster than BOOST's two reference counting approaches, but about 10 times slower than .NET CLR, because .NET uses a linear and thread private approach for object allocation.

To achieve the maximum performance, poeple can use an optimized HnxGC (with a little sacrifice of coding convenience), then HnxGC can become the fastest one among these testing systems in a multi-threading mode, at nearly twice faster than the one in the second place.

For larger object size, this testing does not reflect the difference of these systems well; we prefer other testings showing how reference counting benefits performance for large objects.

See Also:
Download source code of these testings
Download these testing programs' output log
The hardware and software environment for the testings
Home | Download | Terms of Use | Privacy Statement | Contact Us
Copyright@ 2008 hnxgc.harnixworld.com, All rights reserved.