People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias