GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    nixCraft 🐧 (nixcraft@mastodon.social)'s status on Tuesday, 05-Aug-2025 05:24:46 JST nixCraft 🐧 nixCraft 🐧

    Damn. The AI war is getting heated https://xcancel.com/eastdakota/status/1952379571527193017 All AI companies ignores robots.txt and any other block you put are also ignored. Without your your private data their AI can't answer back anything. They are stealing from everyone. It is simple as that.

    EDIT: Original blog post https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/

    In conversation about a year ago from mastodon.social permalink

    Attachments


    1. Invalid filename.

    2. https://files.mastodon.social/media_attachments/files/114/971/363/016/604/824/original/9d0ab1c5453a2900.png

    • Embed this notice
      翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Tuesday, 05-Aug-2025 22:32:30 JST 翠星石 翠星石
      in reply to
      • eliseo
      @eliseo01 @nixCraft The issue isn't the copying.

      The issue is that the companies is scraping the hell out of all websites to scrape as much information as possible, for the purposes of rendering that information totally proprietary (inserted into an inscrutable, undocumented database as part of a LLM).

      Such information will only be available via proprietary software and SaaSS, thus people will indeed be attacked.

      If would be no problem if the company was merely building a decent search engine, with reasonable spider crawl speed, which will be usable without proprietary software - but they're not doing that.
      In conversation about 11 months ago permalink
    • Embed this notice
      eliseo (eliseo01@fe.disroot.org)'s status on Tuesday, 05-Aug-2025 22:32:32 JST eliseo eliseo
      in reply to
      @nixCraft

      I'd like to know how exactly does public information equate to "private data" and how copying it is "stealing from everyone". I'm genuinely surprised to see this double-standard on Fediverse of all places where I'd like to assume people should be aware copying is not stealing and that information wants to be free.

      There's many valid criticism against indiscriminate, industrial-scale usage of LLM bots and scrapers, copying publicly available data is not one of them, in fact this only portrays you like an hypocrite specially if you're in favor of a decentralized web and free flow of information.
      In conversation about 11 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.