in order to single-step on this goddamn core, you have to program the two breakpoint/watchpoint units in tandem
the second one is used to match on only the address of the current instruction
and the first one is used to construct a NOT gate appended to the output of the second one
it took me like 20 minutes to figure out how to implement this, _and i write RTL professionally_
imagine being a developer with no RTL background or something :)