GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    pistolero (p@fsebugoutzone.org)'s status on Friday, 06-Jun-2025 13:15:25 JST pistolero pistolero
    Someone was scraping the shit out of FediList (hard enough that the bandwidth was getting et) so I popped in to look at the logs. Nothing interesting: the problem was solved when I killed off Huawei Cloud's IP space. (You are not running Firefox 3.6 on Ubuntu 10 or 4.0b11pre on Windows Server you lazy motherfucker. Update your fake-ass UA strings.) But while I was in there I looked around a little more and apparently OpenAI was scraping it. I thought I'd told them, via robots.txt, to fuck off, so I checked the URL.

    Usually if I see a bot and I can't view the URL the bot's operator puts in the UA over Tor, I will just kill the bot. OpenAI won't show you the URL without JavaScript (the "blank white screen" fail), they block mothra, *and* they have apparently blocked my actual IP, because they are giving me 403s.

    Letting them redirect you from https://openai.com/searchbot to https://platform.openai.com/docs/bots/ and then run *literally* 6MB of JavaScript, though, will allow you to view the four paragraphs of text (plus a few links and the UA strings) at https://archive.is/cCuWn . This is next-level horseshit, they should ask their bot to write them a thing that puts text on a website GODDAMN.
    fuckinghell.png
    In conversation about 5 months ago from fsebugoutzone.org permalink

    Attachments


    1. https://media.freespeechextremist.com/rvl/full/def9da62a5a378de2eae73d264f3a5b9ecb30c61684bfae6d03048a66572cbff?name=fuckinghell.png



    • ✙ dcc :pedomustdie: :phear_slackware:, Phantasm and soberano like this.
    • Embed this notice
      Hertz (hertz@poa.st)'s status on Saturday, 07-Jun-2025 00:36:07 JST Hertz Hertz
      in reply to
      @p Sounds like they're an annoying pain in the ass.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 00:48:16 JST pistolero pistolero
      in reply to
      • Hertz
      @Hertz The reason they put the URL in there is that it's part of running a well-behaved bot: you explain what the bot is doing, how it handles robots.txt, how to contact the person/company operating it if there is a problem, etc. If it's not present, you can usually assume the bot isn't well-behaved. I think "can't even view this shit without jumping through hoops" is sufficiently evil, but I'm not crazy about them using my bandwidth to train a proprietary AI, and the fact that they use your resources and then charge for access to data they scrape is insulting.
      In conversation about 5 months ago permalink
    • Embed this notice
      Judge Dread (judgedread@poa.st)'s status on Saturday, 07-Jun-2025 01:12:50 JST Judge Dread Judge Dread
      in reply to
      @p OpenAI, unlike a real search engine, would be blocked by anyone aware that they were scraping because they're known thieves.

      So of course they obfuscate everything.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 01:18:11 JST pistolero pistolero
      in reply to
      • Judge Dread
      @judgedread Yeah, I'd 403'd one of their bots but they apparently run three. I have a piece of middleware in there that blacklists some UAs by substring. I was mostly using it for stuff like zgrab and semrush, but I tacked on openai.com so unless they change the URL or stop supplying it, they're dead.
      In conversation about 5 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: openaicom.imgix.net
        OpenAI
        Creating safe artificial general intelligence that benefits all of humanity
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 05:25:36 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      @p Say AI is the next big thing and never disagree with Musk again, or Uncle Limewire won't bless you with his wisdom.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 05:29:43 JST pistolero pistolero
      in reply to
      • di0nysius the patomskyite
      @dsm WE HAVE TO MAKE BETTER AISLOP THAN CHINA IS MAKING OR WINNING TEAM

      THIS REQUIRES A LOT OF H1Bs (do not ask why India is still not a tech superpower)

      THANK YOU ELON
      uncle-nintendo.jpg
      In conversation about 5 months ago permalink

      Attachments


      1. https://media.freespeechextremist.com/rvl/full/ce4f839ddc1555abf7f9e82ca8ada20b23dff5f9780782477207e9b208ff9282?name=uncle-nintendo.jpg
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 07:47:55 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      @p if your VCR blinks 12:00 3 times he will appear in a mirror and block you.
      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 07:49:09 JST pistolero pistolero
      in reply to
      • di0nysius the patomskyite
      @dsm I'M DISROOOOOOOOOPTING
      In conversation about 5 months ago permalink
    • Embed this notice
      SilverDeth (silverdeth@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 07:54:23 JST SilverDeth SilverDeth
      in reply to
      @p The companies training these Goddamned things should have to pay for our bandwidth.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 07:55:24 JST pistolero pistolero
      in reply to
      • SilverDeth
      @SilverDeth I should do what I am doing with the zaps motherfuckers and just 402 them.
      In conversation about 5 months ago permalink
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 08:01:05 JST pistolero pistolero
      in reply to
      • di0nysius the patomskyite
      @dsm Hey. You're talking like someone that is planning to redeem.

      :mokouno: Redeem
      :mokouyes: The needful
      c87609b2774fc03dd92f0136cd4f5d9f32e94eb121a2c71a03ca11a80932dcaf.jpg
      In conversation about 5 months ago permalink

      Attachments


      1. https://media.freespeechextremist.com/rvl/full/19781b7a8058f003b495026b7ed279c265f4c8119857c318caae27ec649ea283?name=c87609b2774fc03dd92f0136cd4f5d9f32e94eb121a2c71a03ca11a80932dcaf.jpg
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 08:01:08 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      @p

      Imagine, considering cockroach opinions.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      SilverDeth (silverdeth@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 08:02:22 JST SilverDeth SilverDeth
      in reply to
      @p Yes. But it's shit that you even have to.

      So they should be paying you for your time AND your bandwidth.

      Heh. Invoice the fuckers.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 08:05:45 JST pistolero pistolero
      in reply to
      • SilverDeth
      @SilverDeth Remember that guy that sent Google those nonsense invoices and they only got paid when he started asking for absurd amounts?
      In conversation about 5 months ago permalink
    • Embed this notice
      SilverDeth (silverdeth@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 08:34:01 JST SilverDeth SilverDeth
      in reply to
      @p Reading now.

      122 million... hahahaha.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 10:40:07 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      @p

      He sent push notifications when he "🤨" reacted Trump/Epstein shit. How petty. He doesn't even federate, and he put ads on the time-line.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Saturday, 07-Jun-2025 10:41:51 JST pistolero pistolero
      in reply to
      • di0nysius the patomskyite
      @dsm

      > He sent push notifications when he "🤨" reacted Trump/Epstein shit.

      Holy fuck.

      > He doesn't even federate, and he put ads on the time-line.

      There were ads on the timeline before he got there; I paid $20 for an ad campaign once.

      But he still doesn't even federate.
      nosnarf.jpg
      In conversation about 5 months ago permalink
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 01:52:10 JST Phantasm Phantasm
      in reply to
      @p
      >You are not running Firefox 3.6 on Ubuntu 10 or 4.0b11pre on Windows Server you lazy motherfucker. Update your fake-ass UA strings.
      Yeah, they were some funny ones I also saw coming from there when dealing with the Git scraping. A UA for IE6 or something, few Symbian ones and some random Alcatel UA.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
       (mint@ryona.agency)'s status on Friday, 13-Jun-2025 02:20:36 JST  
      in reply to
      @p >You are not running Firefox 3.6 on Ubuntu 10
      You sayin'?
      Screenshot.png
      In conversation about 5 months ago permalink

      Attachments


      1. https://ryona.agency/media/f9/82/ce/f982cea586ca81d4825cf13dace0235c1c88ebd91c640fa100bff4f22bd01d8b.png?name=Screenshot.png
      pistolero likes this.
    • Embed this notice
       (mint@ryona.agency)'s status on Friday, 13-Jun-2025 02:24:40 JST  
      in reply to
      • tsoifan1997
      • Phantasm
      @sysrq @phnt @p I regularly get Sony Ericsson defalut WAP/GPRS browser useragents.
      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      tsoifan1997 (sysrq@lab.nyanide.com)'s status on Friday, 13-Jun-2025 02:24:41 JST tsoifan1997 tsoifan1997
      in reply to
      • Phantasm
      @phnt @p Alcatel mentioned
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      wuhan.bat™ (jae@darkdork.dev)'s status on Friday, 13-Jun-2025 02:29:22 JST wuhan.bat™ wuhan.bat™
      in reply to
      • 
      • tsoifan1997
      • Phantasm
      @mint @phnt @p @sysrq and all i get is ie6 sigs
      In conversation about 5 months ago permalink
      , Phantasm and pistolero like this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 03:39:25 JST pistolero pistolero
      in reply to
      • Phantasm
      @phnt Yeah, maybe same guy, or at least same UA list.
      In conversation about 5 months ago permalink
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 03:41:15 JST pistolero pistolero
      in reply to
      • 
      @mint How did you get that ancient-ass openssl lib to load the avatars?
      In conversation about 5 months ago permalink
       likes this.
    • Embed this notice
      m0xEE (m0xee@nosh0b10.m0xee.net)'s status on Friday, 13-Jun-2025 03:47:01 JST m0xEE m0xEE
      in reply to
      • Hertz
      @p@fsebugoutzone.org @Hertz@poa.st
      Call me when it's finally time to start, you know… doing that thing to the data centres 😇
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 03:48:00 JST pistolero pistolero
      in reply to
      • Hertz
      • m0xEE
      @m0xEE @Hertz I'd seize them, maybe, if that was the plan. I can't burn that much cool gear.
      In conversation about 5 months ago permalink
    • Embed this notice
       (mint@ryona.agency)'s status on Friday, 13-Jun-2025 03:53:21 JST  
      in reply to
      @p I did not, look at the URL. Made a bunch of nossl hosts on the local server including mediaproxy.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 03:54:36 JST Phantasm Phantasm
      in reply to
      @p It's probably government ran since they exclusively used Chinese companies with servers all around the world. Huawei was from Singapore, Hong Kong (I think) and some random country in Africa. Alibaba came exclusively from US using their LLC.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 04:06:59 JST pistolero pistolero
      in reply to
      • 
      @mint Ah, media proxy; okay.
      In conversation about 5 months ago permalink
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 04:09:53 JST pistolero pistolero
      in reply to
      • Phantasm
      @phnt The post from yesterday ( https://fsebugoutzone.org/notice/Av2buWqTPJFgaPRNcu ) has a list of IPs attached; a lot of residential US IPs.
      In conversation about 5 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: media.freespeechextremist.com
        pistolero: “The awk script has eaten about 7200 individual IPs (not counting the /12s and /16s and a lot of /22s and /24s because apparently Tencent Cloud owns a huge number of blocks) and the tardbot is still...”
        pistolero (@p@fsebugoutzone.org): “The awk script has eaten about 7200 individual IPs (not counting the /12s and /16s and a lot of /22s and /24s because apparently Tencent Cloud owns a huge number of blocks) and the tardbot is still...”
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 04:18:23 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      • Phantasm
      @p @phnt

      "Why is my TP-Link calling home?"
      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 04:21:48 JST Phantasm Phantasm
      in reply to
      • di0nysius the patomskyite
      @dsm @p TP-Link: That is intended behavior :-)

      I should look through my logs for the new hipster named ASUS vuln whose name I forgot.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 04:30:41 JST Phantasm Phantasm
      in reply to
      • cjd
      @cjd @p I don't know how this scraping op behaves, but the one I encountered now 2 (?) months ago used the same strategy that SSH scanners use. Do one request with one IP and then dump it for hours. Rate-limiting is ineffective unless you do it per-subnet and since in p's case they are using residential proxies, that's also barely possible without dropping legitimate traffic.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 04:30:42 JST cjd cjd
      in reply to
      • Phantasm
      If they're not actually like syn flooding you, you can just do a per-ip rate limit in nginx, I use a lot of these to slow down the bots.

      If they're actually DDoSing (or scraping with way too many IPs simultaneously), I'd recommend ipset over -j DROP since -j DROPs are handled sequentially...

      -A INPUT -m set --match-set blacklist src -j DROP

      ipset create blacklist
      ipset add blacklist <ip>
      In conversation about 5 months ago permalink
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 04:31:39 JST Phantasm Phantasm
      in reply to
      • cjd
      • Phantasm
      @cjd @p Edit: Disregard
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 04:46:57 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt

      > If they're not actually like syn flooding you,

      They don't arrive in the webserver logs if it's a syn flood.

      > you can just do a per-ip rate limit in nginx

      I do this already. When I say "a lot of residential US IPs", I mean a lot of them. I mean that I killed off about 10k IPs individually so far, not counting the "all of tencent cloud" IPs.
      In conversation about 5 months ago permalink
      Nietzschean Ekko Enjoyer repeated this.
    • Embed this notice
      m0xEE (m0xee@nosh0b10.m0xee.net)'s status on Friday, 13-Jun-2025 04:54:20 JST m0xEE m0xEE
      in reply to
      • Hertz
      @p@fsebugoutzone.org @Hertz@poa.st
      Yeah, and realising how all this hardware gets utilised every day only to harvest more random text to feed it to a thing that they expect one day to tell them "42" irks me to a great degree 😩
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 05:26:52 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt ...And now all of EC2 and Akamai. Gaddame.
      In conversation about 5 months ago permalink
      Phantasm likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 05:28:22 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt Someone's either gone to a lot of trouble acquiring free botnet or spent a lot of money on several "cloud" computing services and buying botnets.
      In conversation about 5 months ago permalink
      Phantasm likes this.
    • Embed this notice
      Nietzschean Ekko Enjoyer (r000t@ligma.pro)'s status on Friday, 13-Jun-2025 05:36:36 JST Nietzschean Ekko Enjoyer Nietzschean Ekko Enjoyer
      in reply to
      • cjd
      • Phantasm

      @cjd @phnt @p So, a few months ago, I was frantically scrolling your profile and my mentions looking for this.

      Now I gotta frantically scroll for the person who was asking for it.

      Some day it will all line up.

      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 05:36:38 JST cjd cjd
      in reply to
      • cjd
      • Phantasm
      Err, almost forgot shameless self promotion : https://github.com/cjdelisle/big_download
      In conversation about 5 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: opengraph.githubassets.com
        GitHub - cjdelisle/big_download: Big Download - Node express middleware for zipbombing vuln scanners
        Big Download - Node express middleware for zipbombing vuln scanners - GitHub - cjdelisle/big_download: Big Download - Node express middleware for zipbombing vuln scanners
      Phantasm and pistolero like this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 05:36:39 JST cjd cjd
      in reply to
      • Phantasm
      Hmm...

      If you're able to detect via UA (which I guess you are) and they're sending Accept gzip header, you can just send them compressed nulls. If they don't send an Accept gzip header, well, you can require it because all normal browsers are able to do that...
      In conversation about 5 months ago permalink
      pistolero likes this.
      pistolero repeated this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 05:42:29 JST cjd cjd
      in reply to
      • Nietzschean Ekko Enjoyer
      • Phantasm
      It's ironic, because nobody ever fucks with any of my services for some reason ¯\_(ツ)_/¯

      Knock on wood, because as much as I enjoy torturing bots, I enjoy being lazy more...
      In conversation about 5 months ago permalink
      ✙ dcc :pedomustdie: :phear_slackware: and pistolero like this.
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 05:49:11 JST Phantasm Phantasm
      in reply to
      • cjd
      @p @cjd ... With seemingly no end result. Like what are they gone do, find a list of fedi instances (to scrape)? I wonder if it's some more general badly written scraper that for some reason found out about FediList and scrapes it like a normal crawler would.
      In conversation about 5 months ago permalink
      ✙ dcc :pedomustdie: :phear_slackware: and pistolero like this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 05:50:15 JST cjd cjd
      in reply to
      • Phantasm
      Probably "L7 DDoS" service. They can't hit too hard because they get instabanned off the hosting provider, so they're slow-rolling...
      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 05:54:17 JST cjd cjd
      in reply to
      • Phantasm
      Come scrape Pkteerium...
      In conversation about 5 months ago permalink

      Attachments


      1. https://pkteerium.xyz/media/88ee5587a1eac036fa3fedde98720736781e99e0d1f09bc9f43b6469a2e5ae13.png
      ✙ dcc :pedomustdie: :phear_slackware:, Phantasm and pistolero like this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 05:59:09 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt I wanna avoid hosing anyone's RSS feeds or curl or whatever; FediList is intended to be used like that. I have the problem more or less solved, the main thing is who and why.
      In conversation about 5 months ago permalink
      Phantasm likes this.
    • Embed this notice
      wuhan.bat™ (jae@darkdork.dev)'s status on Friday, 13-Jun-2025 05:59:26 JST wuhan.bat™ wuhan.bat™
      in reply to
      • cjd
      • Phantasm
      @p @phnt @cjd sending you some base configurations that i build off of. adjust as you see fit.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:07:25 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt Sure it's a DDoS instead of just a scraper?
      In conversation about 5 months ago permalink
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:08:39 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @phnt @cjd That's my suspicion.
      In conversation about 5 months ago permalink
      Phantasm likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:10:06 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @phnt @cjd Oh, and they are retaining the URLs and processing them, because they are following links generated on the pages.
      In conversation about 5 months ago permalink
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 06:13:52 JST cjd cjd
      in reply to
      • Phantasm
      I wonder if you can pad out every html file with like 500MB of repeating character in a hidden CDATA or something...

      Compressed, it adds like 1k to each page load and the average browser/device will not care in the least bit (that's another issue for another day), but the scraper MIGHT end up trying to store it all...
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 06:16:52 JST Phantasm Phantasm
      in reply to
      • cjd
      @p @cjd It would also align with the issue I had with Gitea. They also processed responses and followed up with requests, which ended up with the repo sizes balooning thanks to them downloading the zip and tar archives along with git bundles and every commit diff page they found out about for every public repo.

      I didn't see EC2 and Akamai, but I did see a bunch of IPs belonging to Google's usercontent separation last month.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:17:41 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt HA! That is a fun idea. I could just put it in the <head> before any of the metadata shows up. Even if they're not storing it, that's a lot of data to send through the pipe, and I could just do it inline.
      In conversation about 5 months ago permalink
    • Embed this notice
      di0nysius the patomskyite (dsm@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:18:06 JST di0nysius the patomskyite di0nysius the patomskyite
      in reply to
      • cjd
      • Phantasm
      @cjd @p @phnt

      Local man discovers z-bombing.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      laurel (laurel@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:20:19 JST  laurel laurel
      in reply to
      • cjd
      • Phantasm
      @p @phnt @cjd

      The scraping industry has become quite diverse regarding vertical integration lately.
      There are:
      - "legitimate" companies selling residential and mobile proxies
      - full service companies where you provide a url and you receive the scrape result, some with an API frontend.
      (regarding the full service firms the pricing is directly correlated to anti-bot overcoming techniques)
      - There are a couple of sites like multi-vendor marketplaces for scrapers. Vendors will provide the spider program, the site operator will run it, and the user will pay per result

      It could be one of these fully automated firms working for OpenAI. Many people have the illusion that these neural algorithms are intelligent monoliths whereas it is usually the combination of many neural nets alongside traditional web infrastructure.
      In conversation about 5 months ago permalink
      Phantasm and pistolero like this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 06:21:26 JST cjd cjd
      in reply to
      • Phantasm
      Not sure of anything, but the way they're picking up IPs all over the world, kind of smells like it....
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:28:17 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt I am trying to support everything. The intent is to provide a service for understanding fedi.

      https://fedilist.com/p/about
      https://blog.freespeechextremist.com/blog/about-fedilist.html
      In conversation about 5 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Some Notes about FediList (and Poisoned Data) — FSE Blog
      2. Domain not in remote thumbnail source whitelist: fedilist.com
        CFedi
        CFedi
      ✙ dcc :pedomustdie: :phear_slackware: likes this.
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 06:28:19 JST cjd cjd
      in reply to
      • Phantasm
      Are you trying to support like curl requests or something, or can you require everyone to Accept gzip ?
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 06:36:22 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @cjd @phnt

      > because "everything" includes scrapers

      Only if you decide "everything" means "every possible thing" instead of "I do not want to assume that the client is not some dip's PHP code".

      > Maybe if no Accept header, bounce them to a page which tells them to send Accept or else request a **gulp** API token (?)

      Nah, see the blog post; I don't wanna do anything like that.
      In conversation about 5 months ago permalink
    • Embed this notice
      cjd (cjd@pkteerium.xyz)'s status on Friday, 13-Jun-2025 06:36:23 JST cjd cjd
      in reply to
      • Phantasm
      Hmm tricky, because "everything" includes scrapers.

      Maybe if no Accept header, bounce them to a page which tells them to send Accept or else request a **gulp** API token (?)
      In conversation about 5 months ago permalink
    • Embed this notice
      Phantasm (phnt@fluffytail.org)'s status on Friday, 13-Jun-2025 06:50:31 JST Phantasm Phantasm
      in reply to
      • cjd
      @p @cjd I think you could do blocking based on UA for older browsers (like ~2 year old versions) on certain endpoints you would encounter from a browser like: /instance; graphs; search; the stats/hockey stick ones. Anybody that goes there with older browsers or things like curl/wget gets nullrouted for like a day. There isn't a reason to go there without a browser (even something like links/lynx should get whitelisted). And leave only the RSS/Prometheus endpoints wide open.

      I think at this point it's better to provide a degraded service then one that barely works at all, if you infra can't keep up currently. Also you are probably already doing that and I just missed it/forgot about it. I guess that's what the awk script does.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Friday, 13-Jun-2025 07:01:58 JST pistolero pistolero
      in reply to
      • cjd
      • Phantasm
      @phnt @cjd

      > I think you could do blocking based on UA for older browsers

      Yeah, that, plus X-Forwarded-For, etc. Look at the line in the image in the thread where you were CC'd, there are a lot of things that stick out.

      > Anybody that goes there with older browsers or things like curl/wget gets nullrouted for like a day

      curl/wget are fine. They are the use-case, more than a browser. The web interface is intended for both people and machines.

      > Also you are probably already doing that

      Indeed. The problem is solved: the mystery is not.
      In conversation about 5 months ago permalink
      Phantasm likes this.
    • Embed this notice
      Judge Dread (judgedread@poa.st)'s status on Friday, 13-Jun-2025 08:28:37 JST Judge Dread Judge Dread
      in reply to
      • laurel
      @laurel @p One of the AI art companies just got busted by Disney scraping corporate sites whenever someone prompted for one of their characters.
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      þernia (pernia@cum.salon)'s status on Sunday, 15-Jun-2025 06:00:05 JST þernia þernia
      in reply to
      @p heh, that as me. mb buddy
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Sunday, 15-Jun-2025 06:01:10 JST pistolero pistolero
      in reply to
      • þernia
      @pernia Which thing was you? You rented a bunch of Huawei/Tencent machines to pretend to be running Firefox 3, or you wrote the 6MB of JS on https://platform.openai.com/docs/bots/ ?
      In conversation about 5 months ago permalink

      Attachments


    • Embed this notice
      þernia (pernia@cum.salon)'s status on Sunday, 15-Jun-2025 06:01:42 JST þernia þernia
      in reply to
      • di0nysius the patomskyite
      @p @dsm Elon and Ian probably never played mafia
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Sunday, 15-Jun-2025 06:02:42 JST pistolero pistolero
      in reply to
      • þernia
      • di0nysius the patomskyite
      @pernia @dsm Not real gamers. Cheong himself may qualify as a fake gamer girl.
      ianmileschongindrag.png
      ianmileschongreplace.png
      In conversation about 5 months ago permalink

      Attachments


      1. https://media.freespeechextremist.com/rvl/full/81d908b20bd7e1e71847db681ac8e198fb2ac5335caee46a43db332238e64559?name=ianmileschongindrag.png

      2. https://media.freespeechextremist.com/rvl/full/538e08f451309d34c1d627cb060a2c28d084b7a97d21d50803adfa36fff5bf94?name=ianmileschongreplace.png
      pwm likes this.
    • Embed this notice
      tsoifan1997 (sysrq@lab.nyanide.com)'s status on Sunday, 15-Jun-2025 06:07:07 JST tsoifan1997 tsoifan1997
      in reply to
      • þernia
      • di0nysius the patomskyite
      @p @pernia @dsm gigatrvke
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      þernia (pernia@cum.salon)'s status on Sunday, 15-Jun-2025 06:12:15 JST þernia þernia
      in reply to
      @p yep
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Sunday, 15-Jun-2025 06:12:19 JST pistolero pistolero
      in reply to
      • þernia
      @pernia Both of them?
      In conversation about 5 months ago permalink
    • Embed this notice
      þernia (pernia@cum.salon)'s status on Sunday, 15-Jun-2025 06:13:14 JST þernia þernia
      in reply to
      • di0nysius the patomskyite
      @p @dsm bruh
      In conversation about 5 months ago permalink
      pistolero likes this.
    • Embed this notice
      pistolero (p@fsebugoutzone.org)'s status on Sunday, 15-Jun-2025 06:20:57 JST pistolero pistolero
      in reply to
      • þernia
      @pernia The answers are important.
      In conversation about 5 months ago permalink
    • Embed this notice
      þernia (pernia@cum.salon)'s status on Sunday, 15-Jun-2025 06:20:58 JST þernia þernia
      in reply to
      @p you ask a lotta questions
      In conversation about 5 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.