Hardware-assisted garbage collection

Hardware-assisted garbage collection is the use of specialized hardware mechanisms to improve the efficiency and performance of garbage collection in computer systems. This approach integrates hardware support directly into the processor or memory system to handle tasks traditionally managed by software, such as object allocation, reference counting, or mark-and-sweep operations. It is particularly relevant in real-time systems, embedded systems, and high-performance computing environments where software-only garbage collection may introduce unacceptable pauses or overhead.

History

Research into hardware-assisted garbage collection dates back to the 1990s. Early work focused on simulation studies to analyze the behavior of such systems.[1] By the early 2000s, proposals emerged for integrating garbage collection into hardware for real-time embedded systems.[2] Modern developments include accelerators for tracing garbage collection and concurrent collectors for functional languages.[3]

Research in the 1980s included hardware features in Lisp machines. The Symbolics 3600 used a tagged architecture with per-word tags to distinguish pointers from other data, enabling efficient garbage collection operations in hardware and microcode.[4] In the 1990s, Kelvin Nilsen and others advanced real-time hardware-assisted approaches through simulation and prototypes. A 1994 implementation demonstrated high throughput with bounded allocation and collection times suitable for hard real-time systems.[5]

The concept of hardware-assisted garbage collection dates back several decades, with early implementations in specialized Lisp machines and research prototypes. Notable historical systems like The Garbage Collected Memory Module, a near-memory co-processor design[6] or Azul Systems' Vega processors, which included hardware support for read barriers to enable pauseless garbage collection[7]

While many early proposals did not achieve widespread adoption, renewed interest has emerged due to the slowing of Moore's law, the prevalence of garbage-collected languages, and the rise of cloud computing and hardware accelerators.

Mechanisms

Hardware assistance can involve dedicated instructions for memory allocation, reference tracking, or collection phases. For example, some architectures provide support for bitmap-marking or concurrent marking to reduce pauses.[8] In cloud environments, hardware instructions accelerate garbage collection hotspots, improving performance by an order of magnitude.[9][10]

Generational schemes, as noted in general garbage collection literature, can be enhanced with hardware support for real-time operations. Proposals include integrated hardware collectors that run continuously in the background for embedded systems.[11]

Read and write barriers

Read barriers and write barriers are fundamental mechanisms used by garbage collectors to coordinate between application (mutator) threads and the collector. A write barrier is a fragment of code executed before or after every store operation to maintain collector invariants, such as tracking cross-generational references.[12] Hardware can implement barriers more efficiently than software alone; barriers can be implemented either through additional compiler-inserted instructions or by leveraging hardware features such as memory protection.[13]

Some systems, notably Lisp machines, provided hardware support for forwarding pointers, allowing both old and new addresses to be used interchangeably without explicit checks.[14] Read barriers are needed by fewer collector algorithms than write barriers but may have greater performance impact since pointer reads are more frequent than writes.[15]

Applications

This technology is explored in contexts like virtual machines for cloud computing, where middleware overhead is reduced. It is also relevant for non-strict functional languages with concurrent collectors.[16] Hardware accelerators have been proposed for tracing garbage collection in modern architectures.[17]

Implementations in the wild

System / CPU Year introduced Hardware feature(s) Collector type
Symbolics 3600 1983 Per-word tags, list micro-ops Incremental, generational
IBM Wortmann GC FPGA 2014 RTL heap walker Stall-free realtime
Azul Vega 3 2010 Barrier opcode, concurrent remap unit C4 pauseless generational
ARMv9-A (MTE) 2021 4-bit memory/pointer tags Concurrent, safety-aided

Performance

In controlled benchmarks, off-loading the mark phase to a hardware accelerator cut GC time by 65–80% and total application time by up to 25%.[18] Data-centre JVM studies reported consistent sub-1 ms tail-latency for 99.999 th percentile pauses on 256 GB heaps using Azul's C4 on Vega hardware.[19]

Azul Systems and pauseless collection

Azul Systems developed one of the most commercially successful hardware-assisted garbage collection systems. Beginning in 2005, Azul shipped a pauseless garbage collector on its custom Vega hardware platform. Three successive generations of Vega systems relied on custom multi-core processors and a custom OS kernel to deliver the scale and features needed to support pauseless garbage collection.[20]

The Azul architecture included a dedicated read barrier instruction that enabled a highly concurrent, parallel, and compacting garbage collection algorithm. This barrier allows the collector and mutator threads to progress in parallel, with a "self-healing" property ensuring that object references are corrected once and benefit all application threads.[21] The C4 (Continuously Concurrent Compacting Collector) algorithm evolved from this work, using read barriers to support concurrent compaction, concurrent remapping, and concurrent incremental update tracing.[20] C4 allows JVMs to scale to tens or hundreds of gigabytes in heap sizes while sustaining multi-gigabyte-per-second allocation rates.[20]

Modern architectural support

While specialized garbage collection hardware was once the domain of niche machines, features in general-purpose processors are increasingly being adapted for this purpose.

ARM Memory Tagging Extension (MTE)

Introduced in the ARMv9 architecture, MTE provides hardware support for tagging memory regions. While primarily designed for security to prevent buffer overflows and use-after-free vulnerabilities, it is also utilized to accelerate garbage collection. By assigning 4-bit tags to memory granules, a collector can quickly identify the state of memory blocks, facilitating faster marking and sweep phases while ensuring that the mutator does not access stale memory.[22]

Intel Linear Address Masking (LAM)

Intel's Linear Address Masking (LAM) allows software to use untranslated address bits of a pointer to store metadata. This feature is particularly beneficial for garbage collectors that require "pointer coloring," a technique where metadata about an object is encoded directly into its address. LAM allows the processor to ignore these metadata bits during address translation, enabling the collector to check the status of an object without performing additional memory lookups or bitmasking instructions.[23]

RISC-V extensions

Research into the RISC-V ISA has led to the development of custom extensions like "Graceless," which implements a hardware-based tracing accelerator. This system offloads the traversal of object graphs to a dedicated unit near the memory controller. By processing the heap independently of the main CPU cores, it minimizes cache pollution and significantly reduces the duration of "stop-the-world" events in high-throughput applications.[24]

Advantages and challenges

Hardware assistance offers reliable operation, higher performance, and minimal pauses, making it suitable for real-time systems.[25] However, it may require custom hardware, limiting adoption in general-purpose CPUs.[26]

See also

References

  1. ^ Nilsen, Kelvin (18 December 1995). "Progress in hardware-assisted real-time garbage collection". Memory Management. Lecture Notes in Computer Science. Vol. 986. SpringerLink. pp. 355–379. doi:10.1007/3-540-60368-9_34. ISBN 978-3-540-60368-9. Retrieved 27 October 2025.
  2. ^ Nilsen, Kelvin; Schmidt, William (31 October 1992). "Hardware-Assisted General-Purpose Garbage Collection for Hard Real-Time Systems". Iowa State University. Retrieved 27 October 2025.
  3. ^ García, Andrés Amaya; May, David; Nutting, Ed (9 July 2021). "Integrated Hardware Garbage Collection". ACM Transactions on Embedded Computing Systems. 20 (5): 1–25. doi:10.1145/3450147. Retrieved 27 October 2025.
  4. ^ Moon, David A. (1985). "Architecture of the Symbolics 3600". ACM SIGARCH Computer Architecture News. 13 (3). ACM: 76–83. doi:10.1145/327070.327133. Retrieved 24 February 2026.
  5. ^ Schmidt, William J.; Nilsen, Kelvin D. (1994). "Performance of a hardware-assisted real-time garbage collector". ACM SIGPLAN Notices. 29 (11): 76–85. doi:10.1145/195470.195504. Retrieved 24 February 2026.
  6. ^ Schmidt, William J.; Nilsen, Kelvin D. (1994). "Performance of a hardware-assisted real-time garbage collector". ACM SIGPLAN Notices. 29 (11): 76–85. doi:10.1145/381792.195504. Retrieved 27 October 2025.
  7. ^ Maas, Martin; Asanović, Krste; Kubiatowicz, John. "A New Proposal for Hardware-assisted Garbage Collection" (PDF). University of California, Berkeley. Retrieved 27 October 2025.
  8. ^ "functional programming - Hardware Assisted Garbage Collection - Stack Overflow". Stack Overflow. 12 November 2011. Retrieved 27 October 2025.
  9. ^ Tang, Jie; Liu, Shaoshan; Gu, Zhimin; Li, Xiao-Feng; Gaudiot, Jean-Luc (9 November 2010). "Achieving middleware execution efficiency: hardware-assisted garbage collection operations". The Journal of Supercomputing. 59 (3): 1101–1119. doi:10.1007/s11227-010-0493-0. Retrieved 27 October 2025.
  10. ^ Tang, Jie; Liu, Shaoshan; Gu, Zhimin; Li, Xiao-Feng; Gaudiot, Jean-Luc (2010). Hardware-assisted middleware: Acceleration of garbage collection operations. IEEE Xplore. pp. 281–284. Bibcode:2010asap.conf...70T. doi:10.1109/ASAP.2010.5541011. ISBN 978-1-4244-6966-6.
  11. ^ "Integrated hardware garbage collection for real-time embedded systems - University of Bristol". University of Bristol. 28 September 2021. Retrieved 27 October 2025.
  12. ^ "The JVM Write Barrier - Card Marking". Psychosomatic, Lobotomy, Saw. October 2014. Retrieved 24 February 2026.
  13. ^ "GC FAQ -- algorithms". GC List. Retrieved 24 February 2026.
  14. ^ "GC FAQ -- algorithms". GC List. Retrieved 24 February 2026.
  15. ^ "Garbage Collection with LLVM". LLVM Documentation. Retrieved 24 February 2026.
  16. ^ Ramsay, Craig; Stewart, Robert (2024). "Cloaca: A Concurrent Hardware Garbage Collector for Non-strict Functional Languages". Proceedings of the 17th ACM SIGPLAN International Haskell Symposium. pp. 41–54. doi:10.1145/3677999.3678277. ISBN 979-8-4007-1102-2. Retrieved 27 October 2025.
  17. ^ Maas, Martin; Asanovic, Krste; Kubiatowicz, John (2018). "A Hardware Accelerator for Tracing Garbage Collection". 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). pp. 138–151. doi:10.1109/ISCA.2018.00022. ISBN 978-1-5386-5984-7.
  18. ^ Maas, Philipp (June 2018). "A Hardware Accelerator for Tracing Garbage Collection". 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). pp. 98–110. doi:10.1109/ISCA.2018.00022. ISBN 978-1-5386-5984-7. Retrieved 27 October 2025.
  19. ^ Azul Systems (2025). "Azul C4 Garbage Collector". Azul.com. Azul Systems. Retrieved 27 October 2025.
  20. ^ a b c Tene, Gil; Iyengar, Balaji; Wolf, Michael (2011). "C4: The continuously concurrent compacting collector". ACM SIGPLAN Notices. 46 (11): 79–88. doi:10.1145/2076022.1993491.
  21. ^ "The Azul Garbage Collector". InfoQ. 24 February 2011. Retrieved 24 February 2026.
  22. ^ "Memory Tagging Extension (MTE) User Guide". ARM Developer. Retrieved 24 February 2026.
  23. ^ "Linear Address Masking (LAM)". Intel Corporation. Retrieved 24 February 2026.
  24. ^ Maas, Martin (2018). "A Hardware Accelerator for Tracing Garbage Collection". 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). pp. 138–151. doi:10.1109/ISCA.2018.00022. ISBN 978-1-5386-5984-7.
  25. ^ "Richard Jones' Garbage Collection Bibliography". 15 January 2025. Retrieved 27 October 2025.
  26. ^ "blogorrhea: Hardware-assisted garbage collection". 6 November 2008. Retrieved 27 October 2025.