@whitequark Maybe put differently, the same kinds of arguments and excuses used to justify A/B testing are now used to justify tweaking LLMs "based on user feedback." OpenAI has even admitted to that recent versions of 4o encouraged people to think of themselves as religious prophets because of how they interpreted user feedback.
I posit that understanding why A/B testing can both be useful and harmful is itself useful in deflating OpenAI's arguments.