@mint the dev's blog post announcing anubis makes it pretty clear they are retarded >If the client has a User-Agent that does not contain "Mozilla", the client is allowed through. so all the scrapers just using curl or aiohttp or axios or whatever keep hammering your site. when anthropic first started scraping they just sent "claudebot" as the user agent
@mint i'd say this is a clever way to hop the fence, but the author really didn't think this through. much like how the author didn't think nix/nixos through, then kubernetes, then golang, instead decided to continue focus on vtubing and shilling their open-source anubis project. i hear some people made their own hashcah implementation with better featureset. this one guy made one davis in it. you may know him :niggawink:
@mint to be fair the idea was probably, drop (in reality redirect to communist manifesto) all connections from know scraper user agents yourself, and let anubis deal with the other stuff they don't teach about in http 101
So all this javashit blocker degeneracy was never about stopping AI scrapers but about stopping competition and also annoy the shit out of free software enjoyers that don't run arbitrary code. It's always the same shtick with this corporate scum, next thing that'll show up is that OpenAI funded Anubis as the cherry on top.
@sally Wanted to try archwiki, but it appears they've relaxed it as even a regular browser useragent lets me in (it appears to still be running it though as bypass extension still gets triggered).
@mint apparently that’s the chat gippity search mode and not the scraper. As if proprietards are going to let on the difference. But they want the slopalisk to be able to search archwiki when it’s vomiting.
If your website actually has a bot scraper problem you probably already have a rule that heavily rate limits non-browser UAs. Whether sites that use Anubis actually do that is another question.
@eris@lain@mint Aggressive LLM scrapers tend to swap to browser useragents after the scraper useragent gets 403'd.
Such useragent usually has "AppleWebKit" in it and nobody is going to complain if you attack isheep with more proprietary software, but alas they attack GNU icecat users etc instead.