@TedUnderwood you are referring to #RLHF (reinforcement learning by human feedback) as a way of correcting transformer output by human authors. But this technique also covers learning preferences from humans and this aspect hasn’t found much attention in the debate of #LLMs, but may rather be determining for ChatGPT’s success. What is your opinion about this? https://proceedings.neurips.cc//paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html