GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Paul Cantrell (inthehands@hachyderm.io)'s status on Wednesday, 20-May-2026 23:54:35 JST Paul Cantrell Paul Cantrell

    Quick strategy discussion, for those who understand Google indexing and SEO:

    If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

    The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

    2/2

    In conversation about 21 days ago from hachyderm.io permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Wednesday, 20-May-2026 23:55:49 JST Paul Cantrell Paul Cantrell
      in reply to

      Same question as the previous post, except for Wkipedia. What would you like to see them do to send a shot across the bow?

      Or…well, it’s Wikipedia. Maybe more like a shot to the hull.

      3/2

      In conversation about 21 days ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:02:53 JST Paul Cantrell Paul Cantrell
      in reply to
      • disregard Joe Groff

      @joe
      It is and some of us miiiiight already be doing it.

      In conversation about 21 days ago permalink
    • Embed this notice
      disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 00:02:54 JST disregard Joe Groff disregard Joe Groff
      in reply to

      @inthehands is "serve LLM poison to googlebot user-agents" on the table

      In conversation about 21 days ago permalink
    • Embed this notice
      Adam Shostack :donor: :rebelverified: (adamshostack@infosec.exchange)'s status on Thursday, 21-May-2026 00:12:00 JST Adam Shostack :donor: :rebelverified: Adam Shostack :donor: :rebelverified:
      in reply to

      @inthehands (3) sue on the basis that’s it’s not fair use, and these derivative works clearly have a dramatic impact on the value of the original site

      In conversation about 21 days ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:12:00 JST Paul Cantrell Paul Cantrell
      in reply to
      • Adam Shostack :donor: :rebelverified:

      @adamshostack

      This is clearly how copyright law as written •should• work. Not sure if it’s how it •does• work, but if anybody’s trying, they have my sword.

      In conversation about 21 days ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:29:55 JST Paul Cantrell Paul Cantrell
      in reply to

      Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled context from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

      In conversation about 21 days ago permalink

      Attachments


      1. https://media.hachyderm.io/media_attachments/files/116/607/614/432/242/035/original/5e9a0f652530c728.jpeg
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 01:24:04 JST Paul Cantrell Paul Cantrell
      in reply to

      OK, a •lot• of replies need this reponse:

      Yes, of •course• they will start ignoring robots.txt etc as soon as they think it hurts their business. Of course.

      It is important to •force that fight•, rather than just capitulating in advance.

      In conversation about 21 days ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        course.it
        This domain may be for sale!
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 01:30:46 JST Paul Cantrell Paul Cantrell
      in reply to
      • ShadSterling
      • disregard Joe Groff

      @joe @ShadSterling
      I share Joe’s concern that poison-in-box systems will become detectable, but they seem like a good place to start.

      I’m even more a fan of bespoke one-off poison generators for those of us who have the means to write them. Both/and.

      In conversation about 21 days ago permalink
    • Embed this notice
      ShadSterling (shadsterling@mastodon.social)'s status on Thursday, 21-May-2026 01:30:49 JST ShadSterling ShadSterling
      in reply to
      • disregard Joe Groff

      @joe @inthehands is there a coordinated effort that has a website? And/or server plugins that automate serving coordinated poison?

      In conversation about 21 days ago permalink
    • Embed this notice
      disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 01:30:49 JST disregard Joe Groff disregard Joe Groff
      in reply to
      • ShadSterling

      @ShadSterling @inthehands i don't know if there's a coordinated movement. there are prefab tools like https://lib.rs/crates/iocaine that are relatively easy to deploy, though i imagine they also lose some of their effectiveness as they become more popular and LLM providers start to counter them

      In conversation about 21 days ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: lib.rs
        iocaine
        from Gergely Nagy
        The deadliest poison known to AI
    • Embed this notice
      disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 01:30:50 JST disregard Joe Groff disregard Joe Groff
      in reply to

      @inthehands given how eager their summarizer is to incorporate "facts" from even unintentionally adversarial recent posts like satirical blogs, it seems like it wouldn't take much of a coordinated effort to reduce their result quality this way

      In conversation about 21 days ago permalink
    • Embed this notice
      crystal (crystal@hachyderm.io)'s status on Thursday, 21-May-2026 01:53:37 JST crystal crystal
      in reply to

      @inthehands I think if you really want to sell that point, you should explicitly disallow googlebot in robots.txt, then also setup the middleware to respond with 404 to any other URI requested by googlebot

      In conversation about 21 days ago permalink
    • Embed this notice
      Korrupt (korrupt@nrw.social)'s status on Thursday, 21-May-2026 02:51:52 JST Korrupt Korrupt
      in reply to

      @inthehands meta noindex it is, definitely. robots disallow can actually hurt the process, since google cannot access the file with the noindex header and therefore won't deindex.
      btw, they do indeed respect noindex and robots.txt ATM, since its qute easy to check if pages still get found. Then again, you never know what does not show up in search but is used for training (without giving credit, obv.) anyway. As far as i see, google still remains more standard compliant as e.g. OpenAI.

      In conversation about 21 days ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 05:23:07 JST Paul Cantrell Paul Cantrell
      in reply to
      • your auntifa liza 🇵🇷 🦛 🦦

      @blogdiva
      I believe that my various name=“___” values specifically target Google.

      Based on what I’ve read, blocking them in robots.txt will only stop them from •updating• their scrape, whereas noindex means “do not use.” (I have long blocked their LLM-specific bots in robots.txt.)

      In conversation about 20 days ago permalink
    • Embed this notice
      your auntifa liza 🇵🇷 🦛 🦦 (blogdiva@mastodon.social)'s status on Thursday, 21-May-2026 05:23:08 JST your auntifa liza 🇵🇷  🦛 🦦 your auntifa liza 🇵🇷 🦛 🦦
      in reply to

      instead of no-index ―because this would affect all search engines, not just Google― isn’t there a way to target Google specifically in robots.txt?

      there should be a list of all the major techbros crawlers ―Google, Microslop, Facebook, Amazon, X, etc.

      @inthehands

      In conversation about 20 days ago permalink
    • Embed this notice
      your auntifa liza 🇵🇷 🦛 🦦 (blogdiva@mastodon.social)'s status on Thursday, 21-May-2026 05:33:09 JST your auntifa liza 🇵🇷  🦛 🦦 your auntifa liza 🇵🇷 🦛 🦦
      in reply to

      @inthehands TIL thanks

      In conversation about 20 days ago permalink
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 05:33:09 JST Paul Cantrell Paul Cantrell
      in reply to
      • your auntifa liza 🇵🇷 🦛 🦦

      @blogdiva

      Keep it in pencil. I’m still learning myself, and not sure I understand everything correctly here.

      In conversation about 20 days ago permalink
    • Embed this notice
      sean watters (swatters@mastodon.social)'s status on Friday, 22-May-2026 05:43:28 JST sean watters sean watters
      in reply to

      @inthehands i'm spartacus

      In conversation about 19 days ago permalink

      Attachments


      1. https://files.mastodon.social/media_attachments/files/116/614/316/616/956/569/original/933ccdc258f5af35.png
    • Embed this notice
      Paul Cantrell (inthehands@hachyderm.io)'s status on Friday, 22-May-2026 05:43:28 JST Paul Cantrell Paul Cantrell
      in reply to
      • sean watters

      @swatters
      Heck yeah! I hope I got the directives right…. :/

      In conversation about 19 days ago permalink
    • Embed this notice
      Oliver Jensen (ojensen@hachyderm.io)'s status on Monday, 25-May-2026 17:10:49 JST Oliver Jensen Oliver Jensen
      in reply to

      @inthehands i love this idea. I am particularly curious if removing something from their search index also removes the information from the purview of their ai responses. If you have a way to determine this experimentally, I'd love to know the answer.

      In conversation about 16 days ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.