GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 08:03:46 JST Paul Cantrell Paul Cantrell

    Please enjoy this video of crawlers (mostly AI scrapers) attempting to download all of the effectively-infinite content of https://wookieepedia.org/w/

    In conversation about 2 months ago from hachyderm.io permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: wookieepedia.org
      Main Page - Wookieepedia, the hirsute encyclopedia

    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 09:45:01 JST Paul Cantrell Paul Cantrell
      in reply to
      • Nazo

      @nazokiyoubinbou
      Oh, no, I opened up robots.txt wide when the AI craze hit!

      In conversation about 2 months ago permalink
    • Embed this notice
      Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 09:45:02 JST Nazo Nazo
      in reply to

      @inthehands I find it intriguing that even though they're ignoring robots.txt they're still properly identifying themselves. Weird that they breach one trust/protocol but not the other.

      In conversation about 2 months ago permalink
    • Embed this notice
      Nazo (nazokiyoubinbou@urusai.social)'s status on Sunday, 23-Mar-2025 10:03:08 JST Nazo Nazo
      in reply to

      @inthehands ?

      You mean you set it to not disable crawling?

      Regardless though, a lot of people are complaining that these "AI" services ARE crawling their sites even though told not to by robots.txt.

      In conversation about 2 months ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:03:08 JST Paul Cantrell Paul Cantrell
      in reply to
      • Nazo

      @nazokiyoubinbou
      Exaclty, everyone please crawl!

      But yes, I had lots of obvious crawlers slurping it during the many years when the robots.txt said not to crawl anything except the home page.

      In conversation about 2 months ago permalink
    • Embed this notice
      Fat_Farang (fat_farang@mastodon.social)'s status on Sunday, 23-Mar-2025 10:38:48 JST Fat_Farang Fat_Farang
      in reply to

      @inthehands Open source projects are having a tough time battling these AI assholes using up valuable resources. AI Luddites Unite!

      In conversation about 2 months ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Sunday, 23-Mar-2025 10:38:48 JST Paul Cantrell Paul Cantrell
      in reply to
      • Fat_Farang

      @Fat_Farang The good news is that this particular site has extremely low resource usage, so I’m pretty sure the crawling costs them a heck of a lot more than it costs me. Go, little bots, go! Train those models! Vraooouauooo!

      In conversation about 2 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.