So @Toke@helvede.net points out (https://mastodon.social/@Toke@helvede.net/110848880977610283) that #OpenAI does claim to have unique user agent and honor robots.txt when scraping text for #ChatGPT #AI training. Not clear whether this is the only or even primary way publicly accessible web content gets into their training set though https://platform.openai.com/docs/gptbot