Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors (Kyle Wiggers/TechCrunch)
https://techcrunch.com/2024/01/13/anthropic-researchers-find-that-ai-models-can-be-trained-to-deceive/
http://www.techmeme.com/240113/p9#a240113p9