very annoying: to properly benchmark a garbage collector with respect to a program, you need a production-quality compiler / language implementation. otherwise you don't know how much gc overhead (e.g. ensuring stack roots are precisely enumerable) is imposed by the compiler