GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Jonathan Corbet (corbet@social.kernel.org)'s status on Thursday, 23-Jan-2025 08:20:50 JST Jonathan Corbet Jonathan Corbet
    A followup for folks who are curious about the whole AI botswarm problem...

    Some of these bots are clearly running on a bunch of machines on the same net. I have been able to reduce the traffic significantly by treating everything as a class-C net and doing subnet-level throttling. That and simply blocking a couple of them.

    But that leaves a lot of traffic with an interesting characteristic: there are millions of obvious bot hits (following a pattern through the site, for example) that all come from a different IP. An access log with 9M lines as over 1M IP addresses, and few of them appear more than about three times.

    So these things are running on widely distributed botnets, likely on compromised computers, and they are doing their best to evade any sort of recognition or throttling. I don't think that any sort of throttling or database of known-bot IPs is going to help here...not quite sure what to do about it.

    What a world we have made for ourselves...
    In conversation about 4 months ago from social.kernel.org permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Thursday, 23-Jan-2025 12:25:08 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • K. Ryabitsev ????
      • smxi
      @smxi @monsieuricon Suggestions for these countermeasures - and how to apply them without hosing legitimate users - would be much appreciated. I'm glad they are obvious to you, please do share!
      In conversation about 4 months ago permalink
    • Embed this notice
      smxi (smxi@fosstodon.org)'s status on Thursday, 23-Jan-2025 12:25:09 JST smxi smxi
      in reply to
      • K. Ryabitsev ????

      @monsieuricon @corbet so you know the behavior and the pattern. Construct countermeasures. I'm honestly astounded to see guys close to the kernel unable to do this. Think like your opponent. Find his weak spots. Nothing has changed since Sun Tzu made his observations. All bots have weak spots.

      In conversation about 4 months ago permalink
    • Embed this notice
      K. Ryabitsev ???? (monsieuricon@social.kernel.org)'s status on Thursday, 23-Jan-2025 12:25:10 JST K. Ryabitsev ???? K. Ryabitsev ????
      in reply to
      • smxi
      @smxi @corbet we're kinda trying to tell you that a single IP will hit 2-3 times an hour or so. You can't do behavioural analysis over 3 hits. They request 2-3 specific URLs with generic browser client strings and then aren't seen again. But multiply this by tens of thousands of IPs all coming from different subnets and you have a problem.
      In conversation about 4 months ago permalink
    • Embed this notice
      smxi (smxi@fosstodon.org)'s status on Thursday, 23-Jan-2025 12:25:11 JST smxi smxi
      in reply to

      @corbet IP based blocks have been useless for decades. Block behaviors. Most bots cost money to run via bot net rental fees.

      In conversation about 4 months ago permalink
    • Embed this notice
      Jonathan Corbet (corbet@social.kernel.org)'s status on Thursday, 23-Jan-2025 23:39:10 JST Jonathan Corbet Jonathan Corbet
      in reply to
      • penguin42
      @penguin42 They don't tell me what they are doing with the data... the distributed scraping is an easily observable fact, though. Perhaps they are firehosing the data back to the mothership for training?
      In conversation about 4 months ago permalink
    • Embed this notice
      penguin42 (penguin42@mastodon.org.uk)'s status on Thursday, 23-Jan-2025 23:39:11 JST penguin42 penguin42
      in reply to

      @corbet I'm trying to think of the AI training that would be using compromised hosts for scraping; I thought for training you had to do the training part on one or a small number of tightly coupled hosts; so then what is it?

      In conversation about 4 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.