@ins0mniak There are like a million of these retarded "nmap-as-a-service" companies; it must be pretty fuckin' lucrative. (I'm pretty sure Palo Alto Networks is a fed operation, though.)
> Its somewhere between spam and data broker on the scale of people fucking up the internet.
Yeah, if the data were public, I wouldn't care as much. I don't care if Shodan nmaps me. But these fucking fuckers, not just scammy but retarded. Like one time FSE slowed down because they were hammering a closed port, they refused to accept that I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers.
@p@ins0mniak These "people" all use the big 3 tech clouds to host their scrapers. Block their entire ASN and you avoid a lot of grief. Make exceptions if you have to but anyone who hosts their fedi server on azure, aws or gcp is not worth federating with. The few I've seen who I'm blocking this way are your typical shitlib mastodon site or g*rmans. No big loss.
@dj@p I've seen that too. Remember that all journalist masto instance? Dumbass who started it was hosting on AWS. The dude was up for less than a week and was asking people to donate to cover his "$3000" a month hosting costs.
Granted he was being a little bitch and trying to get gibs but still.....AWS? christ with these people.
@p@ins0mniak >I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers. I had an angry moment over Chinese scrapers two weeks ago after I promptly nullrouted half of Huawei Cloud 2 weeks before that. They thought it would be great to switch to Alibaba US and hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files. And in typical Chink when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.
And they have the audacity to use normal browser UAs from a randomized selection of a few making them very hard to block in an easy. Claude on the other hand completely ignores the meta tag and robots.txt, but at least they have "ClaudeBot" in the UA making them trivially blockable in nginx. That said, Claude is also retarded in a different way. They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.
@p Yeah i agree. Scan away if that's all you're doing. The people that do these services..bro I can't even tell you how much I hate them. It's the same type of snivilling fuckbags that get your social security number and address and then leave them up on a unsecured server to get leaked. Then they'll prance around and talk about "competitive data analytics" or some bullshit.
@ins0mniak@p They like to spoof their user agents to look like an iphone or some other benign device. But if all that user agent does is http GET and never POST, then it's a scaper.
@p@dj Imagine paying thousands of dollars for something you could do for free with a little Go programming....(or whatever the hell else someone wants to use)
> hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files.
Complete retards.
I was talking about this a while ago, like, they love git repos. People make these complex tarpits for AI but all you have to do is just run cgit somewhere.
> when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.
Fucking assholes.
> Claude on the other hand completely ignores the meta tag and robots.txt,
Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?
> They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.
Oh, I think they queue it up and then don't even notice until the queue is empty. I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.
Although it was good for a laugh. Watching Taylor Lorenz spin around going "what is federation?, where am I?" lol. Bitch read the documentation
Yeah I haven't seen much NAFO stuff anywhere all of the sudden except for a few Canadian accounts on X raging about Trump. Those seem to be people just bandwagon jumping though.
If Trump withdraws from NATO I will build churches in his honor.
I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.
With Claude it's at least easy. Return 403 to the UA and you are done. Which btw still does not stop their attempts at scraping. They will continue to hit webserver even when they obviously aren't let through. From there a log monitor will do the job.
With the Chink scrapers, it's a bit harder than automated log monitoring. They are clever in a way, where they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work as all of the ones I know of don't do subnet/ASN detection, or it will be very trigger-happy.
Thankfully they are retarded in other ways which make them stick out like a sore thumb in the logs. Currently I just look at the logs every few days unless they trigger alerts and throw the whole announced prefix into the trash. So far that has worked out great.
@nyanide@p@ins0mniak I'll send them when I'm done with other stuff (couple hours).
When in doubt bgp.he.net is your friend. Throw one of the annoying IPs into search->click on AS number->Prefixes vX and enjoy all the nullroutable prefixes.
> With Claude it's at least easy. Return 403 to the UA and you are done.
They completely hammered fedilist, not matter what I returned.
> they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work
Oh, yeah, same shit they do with ssh. Luckily you can just kill off IPs on port 22 because it doesn't matter.
@ins0mniak@phnt Oh, yeah, absolutely. In fact, if you just replay the same shit they are doing back at the machines that are sending the traffic, you probably get a bot army yourself.
@phnt@p@nyanide@ins0mniak if i wanted to scrape from the fediverse i'd just set up an instance and a user i use to talk to others amicably and that's it
@mischievoustomato@p@nyanide@ins0mniak This aren't for Fedi scrapers. These are IPs that kept hammering my Gitea instance until it almost died. One day I literally woke up with 20 alerts in my inbox because of these retards.