@whitequark does one of the two implementations involve more thread switches, where you have lots of "thread A wakes thread B, then thread A goes to sleep"? AFAIK the kernel already can't handle those particularly well (https://youtu.be/KXuZi9aeGTw?t=611), and I imagine adding more noise to the scheduler could make things worse?