FWIW llama3.1 and gemma2 have been doing much better — not getting all the right answers, but sticking to the instructions well.