Single stepping in #gdb, I ultimately came across a longjmp() call (apparently libXt handles errors this way?), and it was this call that triggered the failure.
Turns out each of the subtests checks some exceptional case and expects a function call to fail by longjmp()'ing, but only the first unit test actually prepared for the jump with a call to setjmp().
As a result, when the second subtest triggered its own longjmp() it jumped to the first subtest's function that had already completed!