Vicious criticism of LLMs from this sysadmin who has to deal with their scrapers: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html The LLM scraper problem seems surprising to me. The makers of big new generative AI systems are mostly big-tech firms. Don't they value their reputation, or even the reputation of the AI concept overall, better than to commission these cowboys to do their scraping? But maybe they've already decided that, due to the copyrights risk, it's best done at arms length via shady intermediaries.
@russss Are you saying the data isn't even necessarily being used for training LLMs? The problem is just correlated to rise of LLMs because LLMs are making it a lot easier to write scrapers (and I guess chatGPT will also happily advise on how to bypass mitigations)
Software engineer somewhat specialised towards ruby on rails and towards back-end data work. Open data & open collaborations. Father of two. https://harrywood.co.uk/Phasing out use of https://twitter.com/harry_wood