Ok, just one nop works - another is inserted at the end of the function for word alignment purposes. (i.e: I give 1x nop, the compiler gives me 1x nop for free... I give 2x nops, the compiler gives zero)
The works / broken behaviour phases in and out over 32-bit / 1 word steps... and the behaviour is the same all the way through.
+0 words -> broken
+1 word -> works
+2 words -> broken
+3 words -> works
+4 words -> broken
etc...