GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:12:37 JST Jonathan Corbet Jonathan Corbet
    • LWN.net
    Should you be wondering why @LWN #LWN is occasionally sluggish... since the new year, the DDOS onslaughts from AI-scraper bots has picked up considerably. Only a small fraction of our traffic is serving actual human readers at this point. At times, some bot decides to hit us from hundreds of IP addresses at once, clogging the works. They don't identify themselves as bots, and robots.txt is the only thing they *don't* read off the site.

    This is beyond unsustainable. We are going to have to put time into deploying some sort of active defenses just to keep the site online. I think I'd even rather be writing about accounting systems than dealing with this crap. And it's not just us, of course; this behavior is going to wreck the net even more than it's already wrecked.

    Happy new year :)
    In conversation about 4 months ago from social.kernel.org permalink
    • Haelwenn /элвэн/ :triskell: likes this.
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:20:59 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • Mythic Beasts
      @beasts @LWN We are indeed seeing that sort of pattern; each IP stays below the thresholds for our existing circuit breakers, but the overload is overwhelming. Any kind of active defense is going to have to figure out how to block subnets rather than individual addresses, and even that may not do the trick.
      In conversation about 4 months ago permalink
    • Embed this notice
      Mythic Beasts (beasts@social.mythic-beasts.com)'s status on Wednesday, 22-Jan-2025 05:21:00 JST Mythic Beasts Mythic Beasts
      in reply to
      • LWN.net

      @corbet @LWN in our experience you should prepare for thousands of distinct IPs.

      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:26:17 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • John Francis 🦫🇨🇦🍁💪⬆️
      @johnefrancis @LWN Something like nepenthes (https://zadzmo.org/code/nepenthes/) has crossed my mind; it has its own risks, though. We had a suggestion internally to detect bots and only feed them text suggesting that the solution to every world problem is to buy a subscription to LWN. Tempting.
      In conversation about 4 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        ZADZMO code
        from https://zadzmo.org/humans.txt
    • Embed this notice
      John Francis 🦫🇨🇦🍁💪⬆️ (johnefrancis@cosocial.ca)'s status on Wednesday, 22-Jan-2025 05:26:18 JST John Francis 🦫🇨🇦🍁💪⬆️ John Francis 🦫🇨🇦🍁💪⬆️
      in reply to
      • LWN.net

      @corbet @LWN sounds like you need an AI poisoner like Nerpenthes or iocaine.

      In conversation about 4 months ago permalink
      Valerie Aurora repeated this.
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:27:32 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • bignose
      @bignose @LWN We have gone far out of our way to never require JavaScript to read LWN; we're not going to back on that now.
      In conversation about 4 months ago permalink
    • Embed this notice
      bignose (bignose@sw-development-is.social)'s status on Wednesday, 22-Jan-2025 05:27:33 JST bignose bignose
      in reply to
      • LWN.net

      Thank you @corbet and all at @LWN for continuing the work of providing the excellent #LWN.

      The "active defenses" against torrents of antisocial web scraping bots, has bad effects on users. They tend to be "if you don't allow JavaScript and cookies, you can't visit the site" even if the site itself works fine without.

      I don't have a better defense to offer, but it's really closing off huge portions of the web that would otherwise be fine for secure browsers.

      It sucks. Sorry, and thank you.

      In conversation about 4 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        browsers.it
        This domain may be for sale!
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:46:04 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • K. Ryabitsev ????
      • HAMMER SMASHED FILESYSTEM 🇺🇦
      @lkundrak @monsieuricon @LWN It's a service we provide :)
      In conversation about 4 months ago permalink
    • Embed this notice
      HAMMER SMASHED FILESYSTEM 🇺🇦 (lkundrak@metalhead.club)'s status on Wednesday, 22-Jan-2025 05:46:05 JST HAMMER SMASHED FILESYSTEM 🇺🇦 HAMMER SMASHED FILESYSTEM 🇺🇦
      in reply to
      • LWN.net
      • K. Ryabitsev ????

      @monsieuricon @LWN @corbet are you implying that there are models that are busy being trained to call someone a fuckface over misunderstanding of some obscure arm coprocessor register or respond with viro insults to the most unsuspecting victims?

      In conversation about 4 months ago permalink
    • Embed this notice
      K. Ryabitsev ???? (monsieuricon@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:46:06 JST K. Ryabitsev ???? K. Ryabitsev ????
      in reply to
      • LWN.net
      @corbet @LWN I feel your pain so much right now.
      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 05:56:42 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • Adelie
      @adelie @LWN Blocking a subnet is not hard; the harder part is figuring out *which* subnets without just blocking huge parts of the net as a whole.
      In conversation about 4 months ago permalink
    • Embed this notice
      Adelie (adelie@darkpenguin.social)'s status on Wednesday, 22-Jan-2025 05:56:48 JST Adelie Adelie
      in reply to
      • LWN.net

      @corbet @LWN

      "Any kind of active defense is going to have to figure out how to block subnets rather than individual addresses, and even that may not do the trick. "

      if you're using iptables, ipset can block individual ips (hash:ip), and subnets (hash:net).

      Just set it up last night for my much-smaller-traffic instances, feel free to DM

      https://ipset.netfilter.org/

      In conversation about 4 months ago permalink
    • Embed this notice
      Ronny Adsetts (ronnyadsetts@mastodon.social)'s status on Wednesday, 22-Jan-2025 06:14:17 JST Ronny Adsetts Ronny Adsetts
      in reply to
      • LWN.net

      @corbet @LWN would you be so kind as to write up whatever mitigations you come up with? I've been fighting this myself on our websites. You seeing semi-random user agents too?

      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 06:14:17 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • Ronny Adsetts
      @RonnyAdsetts @LWN The user agent field is pure fiction for most of this traffic.
      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 07:05:41 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • AndresFreundTec
      @AndresFreundTec @LWN Yes, a lot of really silly traffic. About 1/3 of it results in redirects from bots hitting port 80; you don't see them coming back with TLS, they just keep pounding their head against the same wall.

      It is weird; somebody has clearly put some thought into creating a distributed source of traffic that avoid tripping the per-IP circuit breakers. But the rest of it is brainless.
      In conversation about 4 months ago permalink
    • Embed this notice
      AndresFreundTec (andresfreundtec@mastodon.social)'s status on Wednesday, 22-Jan-2025 07:05:42 JST AndresFreundTec AndresFreundTec
      in reply to
      • LWN.net

      @corbet @LWN Do you see a lot of pointlessly redundant requests? I see a lot of related-seeming IPs request the same pages over and over.

      In conversation about 4 months ago permalink
    • Embed this notice
      Ayo (ayo@lonely.town)'s status on Wednesday, 22-Jan-2025 17:37:37 JST Ayo Ayo
      in reply to
      • LWN.net
      • John Francis 🦫🇨🇦🍁💪⬆️

      @corbet @johnefrancis @LWN
      Struggling with likely the same bots over here. I deployed a similar tarpit* on a large-ish site a few days ago - taking care not to trap the good bots - but can't say it's been very successful. It might have taken some load off of the main site, but not nearly enough to make a difference.

      One more thing I'm considering is prefixing all internal links with a '/botcheck/' path for potentially suspicious visitors, set a cookie on that page and strip that prefix with JS. If the cookie is set on the /botcheck/ endpoint, redirect to the proper page, otherwise tarpit them. This way the site would still work as long as the user has *either* JS or cookies enabled. Still not perfect, but slightly less invasive than most common active defenses.

      *) https://code.blicky.net/yorhel/infinite-slop

      In conversation about 4 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: code.blicky.net
        infinite-slop
        from yorhel
        Random garbage web page generator
      Haelwenn /элвэн/ :triskell: likes this.
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Wednesday, 22-Jan-2025 23:18:12 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • Daniel Bovensiepen
      @daniel @LWN The problem with restricting reading to logged-in people is that it will surely interfere with our long-term goal to have the entire world reading LWN. We really don't want to put roadblocks in front of the people we are trying to reach.
      In conversation about 4 months ago permalink
      sergiodj likes this.
    • Embed this notice
      Daniel Bovensiepen (daniel@bovi.social)'s status on Wednesday, 22-Jan-2025 23:18:16 JST Daniel Bovensiepen Daniel Bovensiepen
      in reply to
      • LWN.net

      @corbet @LWN how about restricting reading to logged-in people only and then block the bot requests early in the pipeline to reduce the load

      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Friday, 24-Jan-2025 23:44:25 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • LWN.net
      • Michael K Johnson
      @mcdanlj @LWN What a lot of people are suggesting (nepethenes and such) will work great against a single abusive robot. None of it will help much when tens of thousands of sites are grabbing a few URLs each. Most of them will never step into the honeypot, and the ones that do will not be seen again regardless.
      In conversation about 4 months ago permalink
    • Embed this notice
      Michael K Johnson (mcdanlj@social.makerforums.info)'s status on Friday, 24-Jan-2025 23:44:26 JST Michael K Johnson Michael K Johnson
      in reply to
      • LWN.net

      @corbet @LWN I'm wondering if a link that a human wouldn't click on but an AI wouldn't know any better than to follow could be used in nginx configuration to serve AI robots differently from humans, in a configuration that excluded search crawlers from that configuration. What such a link would look like would be different on different sites. That would require thought from every site, but also that would create diversity which would make it harder to guard against on the scraper side, so possibly could be more effective.

      I might be an outlier here for my feelings on whether training genai such as LLMs from publicly-posted information is OK. It felt weird decades ago when I was asked for permission to put content I posted to usenet onto a CD (why would I care whether the bits were carried to the final reader on a phone line someone paid for or a CD someone paid for?) so it's not inconsistent in my view that I would personally feel that it's OK to use what I post publicly to train genai. (I respect that others feel differently here.)

      That said, I'm beyond livid at being the target of a DDoS, and other AI engines might end up being collateral damage as I try to protect my site for use by real people.

      In conversation about 4 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.