Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Conversation

Notices

Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 08:03:46 JST Paul Cantrell

Please enjoy this video of crawlers (mostly AI scrapers) attempting to download all of the effectively-infinite content of https://wookieepedia.org/w/
In conversation about 2 months ago from hachyderm.io permalink
Attachments
1. Domain not in remote thumbnail source whitelist: wookieepedia.org
  
  Main Page - Wookieepedia, the hirsute encyclopedia
2. Video of HTTP logs with many requests scroll, all for paths with Wookiee nonsense word names
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 09:45:01 JST Paul Cantrell
  in reply to
  - Nazo
  @nazokiyoubinbou
  Oh, no, I opened up robots.txt wide when the AI craze hit!
  
  In conversation about 2 months ago permalink
- Embed this notice
  Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 09:45:02 JST Nazo
  in reply to
  
  @inthehands I find it intriguing that even though they're ignoring robots.txt they're still properly identifying themselves. Weird that they breach one trust/protocol but not the other.
  
  In conversation about 2 months ago permalink
- Embed this notice
  Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 10:03:08 JST Nazo
  in reply to
  
  @inthehands ?
  You mean you set it to not disable crawling?
  Regardless though, a lot of people are complaining that these "AI" services ARE crawling their sites even though told not to by robots.txt.
  
  In conversation about 2 months ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:03:08 JST Paul Cantrell
  in reply to
  - Nazo
  @nazokiyoubinbou
  Exaclty, everyone please crawl!
  But yes, I had lots of obvious crawlers slurping it during the many years when the robots.txt said not to crawl anything except the home page.
  
  In conversation about 2 months ago permalink
- Embed this notice
  Fat_Farang (fat_farang@mastodon.social)'s status on Sunday, 23-Mar-2025 10:38:48 JST Fat_Farang
  in reply to
  
  @inthehands Open source projects are having a tough time battling these AI assholes using up valuable resources. AI Luddites Unite!
  
  In conversation about 2 months ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:38:48 JST Paul Cantrell
  in reply to
  - Fat_Farang
  @Fat_Farang The good news is that this particular site has extremely low resource usage, so I’m pretty sure the crawling costs them a heck of a lot more than it costs me. Go, little bots, go! Train those models! Vraooouauooo!
  
  In conversation about 2 months ago permalink

Public

Conversation

Notices

Feeds