I wasn't sure how much it would matter to vectorize scan loops such as [<] and [>>] in Brainfuck, but on my M1 Mac I'm seeing dbfi.b going about 10x faster due to just this one optimization
@tommythorn ah, impossible to say without having worked on the other thing as well. what I can say about BF is that it's a bit a sweet spot for a compilers class because it has plenty of low-hanging optimizations, but they mostly require BF-specific knowledge, so for example LLVM alone would not be able to pick up most of these benefits.
@tommythorn but so far in class we've only targeted native assembly. once we're done with this, we'll retarget everything to LLVM and pick up for free all of those amazing low-level optimizations that are too numerous for us to do by hand
@regehr Serious question: how does the effort used to get to the current performance level for BF compare to what it would have taken for something less insane, say the LCC VM (which formed the bases for the Quake III VM).