So… People put stuff in their robots.txt to “prevent” malicious scraping of their data for machine learning purposes. I hope everybody understands that this is just a “please don’t take my data” sign on the front lawn. We should be creating heaps of adversarial data instead. Data suitable to taint those datasets.