Afaict valgrind would not have complained about the payload without -fno-omit-frame-pointer. It was because _get_cpuid() expected the stack frame to look a certain way.
Additionally, I chose to use debian unstable to find possible portability problems earlier. Without that valgrind would have had nothing to complain.
Without having seen the odd complaints in valgrind, I don't think I would have looked deeply enough when seeing the high cpu in sshd below _get_cpuid().