World Playground Deceit.net

How I Learned to Stop Worrying and Love GC


Hoihoi p056

I have way too much free time these days, even with a full-time job, so here's a post I've wanted to write for some time: one on the evolution of my opinion towards automatic garbage collection (henceforth GC; as opposed to manual memory management, MMM).

Initial C weenie stance §

My programmer life started in university with ANSI C, where I quickly became a ricer (ArchBang was all the rage then) then a suckless cultist. Really the usual story of a reclusive young man with too much tinkerer blood in his veins and not enough social life.

You should then be able to write the following description in my stead: I considered GC as something for wusses and webdevs, certainly useful but not worthy of my attention compared to the purity of malloc/free.

To elaborate on this, here were the distinct facets of that disdain:

  1. Rigour: I viewed MMM as part of an ideological last stand against the West's new ubiquitous cult of hedonism and laziness, something to instill attention to detail into young students who definitely needed it (including me).
  2. Performance: the usual tribal mockery to the tune of Java will eat all your memory and ask for dessert. Fueled by a vague understanding that you had to give up either throughput, latency or memory usage in the trade for convenience. And the runtime size, mamma mia
  3. Simplicity: a performant GC implementation seemed an extremely complex machinery, compared to a reasonable malloc style allocator.
  4. Learning: I later adopted the view that something automagic like GC would make teaching programming harder; another complicated concept to intuit compared to the lend me N bytes, I'll give them back later model.

Insights and updated views §

Funnily, I became quite interested in computer science a few years after completing my CS MSc. That renewed interest and the resulting knowledge gains changed and/or tempered my views on those four points:

  1. I still believe that trudging through years of MMM helps ward against sloppiness. But more importantly, I'm absolutely convinced that you can only truly appreciate modern amenities like GC if you've personally suffered without; same with sane metaprogramming or parametric typing when coming from C macro contraptions like c-vector and sys/queue.h.
  2. I now know that the performance story is much more complicated than "GC suckz". On one hand, yes, MMM still has a general edge on that front - the runtime has less to do, after all - but consider the following:
    • DefinitionGenerational Definitioncopying/Definitioncompacting GC allows for bump allocation in the Definitionnursery, thus making heap allocation as fast as stack growing.
    • DefinitionIncremental and Definitionconcurrent GC like ZGC or Shenandoah will give you the low pause times you need for soft real-time programs like video games.
    • Hard real-time GC actually exists: see IBM Metronome
    • The state of the art never stopped advancing, the dream of having your cake and eating it has never been so near. See generational ZGC for a good example (and Netflix's production numbers with it).
  3. On the point of simplicity, it's true that a performant GC is much more complex than most malloc type allocators, if only for the increased reliance on unportable code and needed code instrumention for barriers. But I also became aware that performant and multithreaded malloc implementations (e.g. jemalloc, tcmalloc, mimalloc) are anything but simple.
  4. As for learning, I'm ambivalent. On one hand, the difference between stack and heap isn't any easier to teach than GC, but on the other hand, anyone who wants to become a Real Programmer™ will have to learn low level programming anyway to understand what's under the hood of most operating systems and language runtimes.

I also gained an understanding of more subtle issues I initially didn't know about:

  • How GC interacts negatively with FFI. From value boxing forcing conversion in both directions to moving GC having to treat FFI pointers differently, everything becomes a large mess.
  • RCU, one of the leading threading primitives for frequently read & occasionally written data, needs a GC to operate. Yes, reference counting qualifies as GC (in fact, it is).
  • The very narrow API of malloc and co. has the advantage of being so simple it can be swapped via a simplistic LD_PRELOAD. Nice. But it also limits experimentation needing a deeper integration between the language and its compiler. (e.g. escape analysis, LLVM ARC).
  • GC is one of the rare computer science fields still getting real-world progress: Immix (2008, implemented in MMTK, SBCL, Guile, Haxe), userfaultd (2015, Linux 4.3, used in Android 13), mpl (2016~202x), Azul Pauseless/C4 (2010), etc…
  • GC languages should look at C++/Rust's opt-in reference counting for select data and provide the exact opposite: optional MMM like SBCL's new arenas.

And finally, becoming a Tcl then Lisp programmer gave me an epiphany I rarely read about: the very best perk of GC is that it cleanses code from the visual and conceptual pollution that is memory management logic (an implementation detail at best). GC is necessary if one wants to approach the clarity of pseudocode.

New stance §

Here's what I think these days: unless your technical requirements (resource constraints, hard real-time, limited language choice/implementation quality) justify it, you'd be a fool to reject GC.

There are two sorts of such "fools" who categorically shun GC outside of scripting languages:

  • The old curmudgeon - often Grug brained to a fault - who never programmed with GC and (understandably) doesn't want to learn to as he's set in his ways and senses that it's not as simple as duuude, you don't have to remember to free anymore!. Possibly stuck in the 80s when general purpose computers didn't have much memory nor processing power and the GC world was still evolving.
  • The autistic clockmaker type who gets physically ill when he doesn't feel in control of every little cog. Loves bikeshedding, yak shaving and reinventing the wheel to the point of Greenspunning. Often seen using C, C++, Rust and other languages offering infinite tinkering opportunities.

NB: these aren't exclusive.

See also §

Hoihoi p057