GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

... Summing up the top UA groups, it looks like my server is doing 70% of all its work for these fucking LLM training bots that don’t to anything except for crawling the fucking internet over and over again. Oh, and of course, they don’t just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not. They also don’t give a single flying fuck about robots.txt, because why should they. And the best thing of all: they crawl the stupidest pages possible. Recently, both ChatGPT and Amazon were - at the same time - crawling the entire edit history of the wiki. And I mean that - they indexed every single diff on every page for every change ever made. Frequently with spikes of more than 10req/s. Of course, this made MediaWiki and my database server very unhappy, causing load spikes, and effective downtime/slowness for the human users. If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really). This is literally a DDoS on the entire internet.

Download link

https://files.mastodon.social/media_attachments/files/113/724/239/725/832/913/original/ced017dd3cd7a9c6.png

Notices where this attachment appears

  1. Embed this notice
    khobochka (khobochka@mastodon.social)'s status on Saturday, 28-Dec-2024 00:32:55 JST khobochka khobochka

    #LLMs are a fucking scourge. Perceiving their training infrastructure as anything but a horrific all-consuming parasite destroying the internet (and wasting real-life resources at a grand scale) is delusional.

    #ChatGPT isn't a fun toy or a useful tool, it's a _someone else's_ utility built with complete disregard for human creativity and craft, mixed with malicious intent masquerading as "progress", and should be treated as such.

    https://pod.geraspora.de/posts/17342163

    In conversation about 5 months ago from mastodon.social permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.