Anyway, for now it's time to fall back to the code I had before doing indirect access - slower but gets the job done.
There's definitely potential for more optimizations on the FPGA-MCU communication, having the MCU spend more time in sleep, etc.
But I think at this point it's probably worth starting to work on OTA.