Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Conversation

Notices

Embed this notice
JP (jplebreton@mastodon.social)'s status on Wednesday, 02-Jul-2025 09:24:57 JST JP

Does anyone know the concrete technical reason(s) that LLM website scrapers have been so much nastier to deal with than the ones used by major search engines? Like do these people just not know how to write a scraper that won't DDOS (or equivalent effect) a server? Are they trying to get the data faster or more thoroughly than other scrapers? Do they just not care? Like obviously they don't care but I can't tell if that's the main reason they're so horrible or some more technical point.

In conversation about 10 months ago from mastodon.social permalink
- Embed this notice
  Rich Felker (dalias@hachyderm.io)'s status on Wednesday, 02-Jul-2025 09:24:57 JST Rich Felker
  in reply to
  
  @jplebreton They used LLM codegen to write their scrapers, making them particularly shit. 🙃
  
  In conversation about 10 months ago permalink
- Embed this notice
  silverwizard (silverwizard@convenient.email)'s status on Wednesday, 02-Jul-2025 13:50:13 JST silverwizard
  in reply to
  
  @jplebreton part of it is that people are hitting everything randomly. My node gets 10+ hits an hour to its search endpoint for random subjects, which is just berzerk behaviour for a crawler in general
  
  In conversation about 10 months ago permalink

Feeds