@gray Yes. It like anything else has tradeoffs but I would pick an annoying page over a website not working because of resources wasted on training more AI models no one wants
@gray It's a firewall that attempts to discourage AI crawling and other assorted Denials of Service by hiding most websites behind a JavaScript proof-of-work challenge like anything behind Cloudflare or the Kiwi Farms does nowadays. As someone whose git instance got bombarded by crawlers so hard it eventually stopped working I added it because it was less absolute than dropping an entire IPv4 /8, and ever since the AI slurping either stopped or slowed down to a level that it doesn't make the server give up on serving anything. Crawlers are super annoying especially the ones that don't respect any time, resource or crawling restraints
@sam@froth.zone@gray@clubcyberia.co Queen Sam is it difficult to setup? I just got to know about it yesterday and found it interesting (perhaps i'm being ddos and don't even know about it)
@sam i loaded some of the websites on my phone and they loaded really fast even doing the proof of work so i wasn't sure what the big deal was. sometimes it is hard to figure what the latest freak out is over so i wasn't sure
@sam@gray@mint I would come to think that may be a bad design choice sure it might cut off some traffic, but if crawlers catch on its just adding another request with >ozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36" user agent and suddenly the problem suddenly exists again as a ua and then the problem is suddenly worse
The problem is that it doesn't solve the problem it claims to solve and it's more successful at being a fucking nuisance to people that don't want to run malware on their computer.
@RustyCrab@Inginsub@gray@sally I don't think gopher is :proprietary:, if I remember my history properly the reason Gopher lost to HTTP was that the University of Minnesota wanted to charge royalties for anyone using it which they seemed to have abandoned in 2000, the original gopher daemon is GPL.
@waifu@gray No it's really easy you can just use the docker config the website mentions that's what I do I'd look to see if you're being scraped first which all you have to do is check the server logs the regular AI people will add whatever company they are to their agent, the freak ones pretend to be regular browsers
@waifu@sam@gray You probably aren't if you aren't hosting mirrors for some large repos. If you want, I can send you ipsets of subnets owned by China based cloud companies which tried to scrape my Gitea instance along with a few subnets belonging to Google's usercontent. Both of which have zero reasons to connect to git instance.
@waifu@gray@sam Also if you really want to get rid of AI scraping from US, all of them afaik use custom user-agents that can be easily filtered in nginx.
@sam@gray@waifu I know, I have like 3/4 of Huawei and Alibaba blocked. Then they switched to using IPs that belong to Google usercontent which I have no idea how they got and since I nullrouted those, the average request rate is basically zero. Just like it should be.
@phnt@gray@waifu The well-behaving silicon valley ones use custom user agents, the Chinese ones don't they all pretended to be regular Chrome back when I blocked 47.0.0.0/8 which is almost entirely Alibaba Cloud IPs
@sam@gray@mint but again that reduces the problem to skimming logs, which is easier said than done which basically undoes the work of deploying anubis in the first place
@theorytoe@gray@mint you could theoretically do that yes. None of the major scrapers actually do that though because it would then be obvious to anyone snooping the logs that someone is poorly pretending to be an ordinary browser. The point of doing the Mozilla/5.0 vestige is to pretend you are an ordinary web user when you are, in fact, not one
@sam@Inginsub@RustyCrab@gray@freetar@sally Imagine mentally blocking out freedom so hard that you don't even notice all of the GNU software that is production ready that many people use, including in production.
In this context it's all javascript not ran and served directly by a machine you control. Anubis is very good at blocking users that refuse to run it, hence very effective malware.