Public
- Public
- Network
- Groups
- Featured
- Popular
- People

翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Friday, 11-Oct-2024 01:25:07 JST

Embed this notice
翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Friday, 11-Oct-2024 01:25:07 JST 翠星石
in reply to
- opal
@wowaname It's much worse.

LLM scrapers scrape every single page and every single file, changing useragents, IPs, auto-adjusting their scrape rate to avoid and even setting a useragent that is "" (shows up as "-" in nginx logs) or even "-", that causes you to inadvertently 403 the wrong useragent, or even accidentally 403 all of them.

Meanwhile, crawlers seem to at least identify themselves with a crawler useragent, which you can 403 or at least have a crawl rate that utilizes a negligible amount of bandwidth on any half-decent connection.

I believe that you're probably being hit by LLM scrapers that are pretending to be spiders.

In conversation about 4 months ago from gnusocial.jp permalink

Feeds