Conversation

Notices

Embed this notice
nixCraft 🐧 (nixcraft@mastodon.social)'s status on Tuesday, 05-Aug-2025 05:24:46 JST nixCraft 🐧

Damn. The AI war is getting heated https://xcancel.com/eastdakota/status/1952379571527193017 All AI companies ignores robots.txt and any other block you put are also ignored. Without your your private data their AI can't answer back anything. They are stealing from everyone. It is simple as that.
EDIT: Original blog post https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
In conversation about a year ago from mastodon.social permalink
Attachments
- Embed this notice
  翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Tuesday, 05-Aug-2025 22:32:30 JST 翠星石
  in reply to
  - eliseo
  @eliseo01 @nixCraft The issue isn't the copying.
  
  The issue is that the companies is scraping the hell out of all websites to scrape as much information as possible, for the purposes of rendering that information totally proprietary (inserted into an inscrutable, undocumented database as part of a LLM).
  
  Such information will only be available via proprietary software and SaaSS, thus people will indeed be attacked.
  
  If would be no problem if the company was merely building a decent search engine, with reasonable spider crawl speed, which will be usable without proprietary software - but they're not doing that.
  
  In conversation about 11 months ago permalink
- Embed this notice
  eliseo (eliseo01@fe.disroot.org)'s status on Tuesday, 05-Aug-2025 22:32:32 JST eliseo
  in reply to
  
  @nixCraft
  
  I'd like to know how exactly does public information equate to "private data" and how copying it is "stealing from everyone". I'm genuinely surprised to see this double-standard on Fediverse of all places where I'd like to assume people should be aware copying is not stealing and that information wants to be free.
  
  There's many valid criticism against indiscriminate, industrial-scale usage of LLM bots and scrapers, copying publicly available data is not one of them, in fact this only portrays you like an hypocrite specially if you're in favor of a decentralized web and free flow of information.
  
  In conversation about 11 months ago permalink

Public

Notices

Feeds