Please enjoy this video of crawlers (mostly AI scrapers) attempting to download all of the effectively-infinite content of https://wookieepedia.org/w/
Conversation
Notices
-
Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 08:03:46 JST Paul Cantrell
-
Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 09:45:01 JST Paul Cantrell
@nazokiyoubinbou
Oh, no, I opened up robots.txt wide when the AI craze hit!In conversation permalink -
Embed this notice
Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 09:45:02 JST Nazo
@inthehands I find it intriguing that even though they're ignoring robots.txt they're still properly identifying themselves. Weird that they breach one trust/protocol but not the other.
In conversation permalink -
Embed this notice
Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 10:03:08 JST Nazo
You mean you set it to not disable crawling?
Regardless though, a lot of people are complaining that these "AI" services ARE crawling their sites even though told not to by robots.txt.
In conversation permalink -
Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:03:08 JST Paul Cantrell
@nazokiyoubinbou
Exaclty, everyone please crawl!But yes, I had lots of obvious crawlers slurping it during the many years when the robots.txt said not to crawl anything except the home page.
In conversation permalink -
Embed this notice
Fat_Farang (fat_farang@mastodon.social)'s status on Sunday, 23-Mar-2025 10:38:48 JST Fat_Farang
@inthehands Open source projects are having a tough time battling these AI assholes using up valuable resources. AI Luddites Unite!
In conversation permalink -
Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:38:48 JST Paul Cantrell
@Fat_Farang The good news is that this particular site has extremely low resource usage, so I’m pretty sure the crawling costs them a heck of a lot more than it costs me. Go, little bots, go! Train those models! Vraooouauooo!
In conversation permalink
-
Embed this notice