GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    JP (jplebreton@mastodon.social)'s status on Wednesday, 02-Jul-2025 09:24:57 JST JP JP

    Does anyone know the concrete technical reason(s) that LLM website scrapers have been so much nastier to deal with than the ones used by major search engines? Like do these people just not know how to write a scraper that won't DDOS (or equivalent effect) a server? Are they trying to get the data faster or more thoroughly than other scrapers? Do they just not care? Like obviously they don't care but I can't tell if that's the main reason they're so horrible or some more technical point.

    In conversation about 10 months ago from mastodon.social permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Wednesday, 02-Jul-2025 09:24:57 JST Rich Felker Rich Felker
      in reply to

      @jplebreton They used LLM codegen to write their scrapers, making them particularly shit. 🙃

      In conversation about 10 months ago permalink
    • Embed this notice
      silverwizard (silverwizard@convenient.email)'s status on Wednesday, 02-Jul-2025 13:50:13 JST silverwizard silverwizard
      in reply to
      @jplebreton part of it is that people are hitting everything randomly. My node gets 10+ hits an hour to its search endpoint for random subjects, which is just berzerk behaviour for a crawler in general
      In conversation about 10 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.