Might be a bit anticlimactic, but I didn't investigate. I just switched to using the SPI peripheral instead, sending just the right bytes at just the right frequency to get the right timing patterns, using just the data line and no clock line. This does give me a 11760 byte framebuffer for a 20-character display (980 LEDs), but the ESP32 has more than enough RAM. Now it works glitch-free! :D