V useful info in one place here about the datasets GPT3.5 is trained on. I knew this but it still takes your breath away:
"WebText2 is a private OpenAI dataset created by crawling links from Reddit that had three upvotes.
The idea is that these URLs are trustworthy and will contain quality content."
Have these ppl ever READ Reddit?!
https://www.searchenginejournal.com/how-to-block-chatgpt-from-using-your-website-content/478384/