Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Conversation

Notices

Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Wednesday, 20-May-2026 23:54:35 JST Paul Cantrell

Quick strategy discussion, for those who understand Google indexing and SEO:
If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?
The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?
2/2

In conversation about 21 days ago from hachyderm.io permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Wednesday, 20-May-2026 23:55:49 JST Paul Cantrell
  in reply to
  
  Same question as the previous post, except for Wkipedia. What would you like to see them do to send a shot across the bow?
  Or…well, it’s Wikipedia. Maybe more like a shot to the hull.
  3/2
  
  In conversation about 21 days ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:02:53 JST Paul Cantrell
  in reply to
  - disregard Joe Groff
  @joe
  It is and some of us miiiiight already be doing it.
  
  In conversation about 21 days ago permalink
- Embed this notice
  disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 00:02:54 JST disregard Joe Groff
  in reply to
  
  @inthehands is "serve LLM poison to googlebot user-agents" on the table
  
  In conversation about 21 days ago permalink
- Embed this notice
  Adam Shostack :donor: :rebelverified: (adamshostack@infosec.exchange)'s status on Thursday, 21-May-2026 00:12:00 JST Adam Shostack :donor: :rebelverified:
  in reply to
  
  @inthehands (3) sue on the basis that’s it’s not fair use, and these derivative works clearly have a dramatic impact on the value of the original site
  
  In conversation about 21 days ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:12:00 JST Paul Cantrell
  in reply to
  - Adam Shostack :donor: :rebelverified:
  @adamshostack
  This is clearly how copyright law as written •should• work. Not sure if it’s how it •does• work, but if anybody’s trying, they have my sword.
  
  In conversation about 21 days ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 00:29:55 JST Paul Cantrell
  in reply to
  
  Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled context from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”
  In conversation about 21 days ago permalink
  Attachments
  1. Screenshot of a curl request for innig.net. In the response HTML is this comment:  The comment is followed by a list of noindex directives for all of Google’s crawler bots: <meta content='noindex,nofollow' name='google'>
    https://media.hachyderm.io/media_attachments/files/116/607/614/432/242/035/original/5e9a0f652530c728.jpeg
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 01:24:04 JST Paul Cantrell
  in reply to
  
  OK, a •lot• of replies need this reponse:
  Yes, of •course• they will start ignoring robots.txt etc as soon as they think it hurts their business. Of course.
  It is important to •force that fight•, rather than just capitulating in advance.
  In conversation about 21 days ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    course.it
    
    This domain may be for sale!
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 01:30:46 JST Paul Cantrell
  in reply to
  - ShadSterling
  - disregard Joe Groff
  @joe @ShadSterling
  I share Joe’s concern that poison-in-box systems will become detectable, but they seem like a good place to start.
  I’m even more a fan of bespoke one-off poison generators for those of us who have the means to write them. Both/and.
  
  In conversation about 21 days ago permalink
- Embed this notice
  ShadSterling (shadsterling@mastodon.social)'s status on Thursday, 21-May-2026 01:30:49 JST ShadSterling
  in reply to
  - disregard Joe Groff
  @joe @inthehands is there a coordinated effort that has a website? And/or server plugins that automate serving coordinated poison?
  
  In conversation about 21 days ago permalink
- Embed this notice
  disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 01:30:49 JST disregard Joe Groff
  in reply to
  - ShadSterling
  @ShadSterling @inthehands i don't know if there's a coordinated movement. there are prefab tools like https://lib.rs/crates/iocaine that are relatively easy to deploy, though i imagine they also lose some of their effectiveness as they become more popular and LLM providers start to counter them
  In conversation about 21 days ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: lib.rs
    
    iocaine
    
    from Gergely Nagy
    
    The deadliest poison known to AI
- Embed this notice
  disregard Joe Groff (joe@f.duriansoftware.com)'s status on Thursday, 21-May-2026 01:30:50 JST disregard Joe Groff
  in reply to
  
  @inthehands given how eager their summarizer is to incorporate "facts" from even unintentionally adversarial recent posts like satirical blogs, it seems like it wouldn't take much of a coordinated effort to reduce their result quality this way
  
  In conversation about 21 days ago permalink
- Embed this notice
  crystal (crystal@hachyderm.io)'s status on Thursday, 21-May-2026 01:53:37 JST crystal
  in reply to
  
  @inthehands I think if you really want to sell that point, you should explicitly disallow googlebot in robots.txt, then also setup the middleware to respond with 404 to any other URI requested by googlebot
  
  In conversation about 21 days ago permalink
- Embed this notice
  Korrupt (korrupt@nrw.social)'s status on Thursday, 21-May-2026 02:51:52 JST Korrupt
  in reply to
  
  @inthehands meta noindex it is, definitely. robots disallow can actually hurt the process, since google cannot access the file with the noindex header and therefore won't deindex.
  btw, they do indeed respect noindex and robots.txt ATM, since its qute easy to check if pages still get found. Then again, you never know what does not show up in search but is used for training (without giving credit, obv.) anyway. As far as i see, google still remains more standard compliant as e.g. OpenAI.
  
  In conversation about 21 days ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 05:23:07 JST Paul Cantrell
  in reply to
  - your auntifa liza 🇵🇷 🦛 🦦
  @blogdiva
  I believe that my various name=“___” values specifically target Google.
  Based on what I’ve read, blocking them in robots.txt will only stop them from •updating• their scrape, whereas noindex means “do not use.” (I have long blocked their LLM-specific bots in robots.txt.)
  
  In conversation about 20 days ago permalink
- Embed this notice
  your auntifa liza 🇵🇷 🦛 🦦 (blogdiva@mastodon.social)'s status on Thursday, 21-May-2026 05:23:08 JST your auntifa liza 🇵🇷 🦛 🦦
  in reply to
  
  instead of no-index ―because this would affect all search engines, not just Google― isn’t there a way to target Google specifically in robots.txt?
  there should be a list of all the major techbros crawlers ―Google, Microslop, Facebook, Amazon, X, etc.
  @inthehands
  
  In conversation about 20 days ago permalink
- Embed this notice
  your auntifa liza 🇵🇷 🦛 🦦 (blogdiva@mastodon.social)'s status on Thursday, 21-May-2026 05:33:09 JST your auntifa liza 🇵🇷 🦛 🦦
  in reply to
  
  @inthehands TIL thanks
  
  In conversation about 20 days ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Thursday, 21-May-2026 05:33:09 JST Paul Cantrell
  in reply to
  - your auntifa liza 🇵🇷 🦛 🦦
  @blogdiva
  Keep it in pencil. I’m still learning myself, and not sure I understand everything correctly here.
  
  In conversation about 20 days ago permalink
- Embed this notice
  sean watters (swatters@mastodon.social)'s status on Friday, 22-May-2026 05:43:28 JST sean watters
  in reply to
  
  @inthehands i'm spartacus
  In conversation about 19 days ago permalink
  Attachments
  1. screenshot of the ghostty terminal curling https://sean.ordinary.host which returns the same html comment and meta tag list as paul's initial screen shot (with an attribution added to the comment). full text: curl https://sean.ordinary.host <!doctype html><html lang=en><meta charset=UTF-8><meta name=viewport content="width=device-width,initial-scale=1.0"><title>it me | smw</title> <meta content='noindex,nofollow' name='google'> <meta content='noindex,nofollow' name='Googlebot'> <meta content='noindex,nofollow' name='Googlebot-Extended'> <meta content='noindex,nofollow' name='Googlebot-Image'> <meta content='noindex,nofollow' name='Googlebot-News'> <meta content='noindex,nofollow' name='Googlebot-Video'> <meta content='noindex,nofollow' name='Storebot-Google'> <meta content='noindex,nofollow' name='GoogleOther'> <meta content='noindex,nofollow' name='GoogleOther-Image'> <meta content='noindex,nofollow' name='GoogleOther-Video'> <meta content='noindex,nofollow' name='Google-CloudVertexBot'> <meta content='noindex,nofollow' name='Google-Extended'>
    https://files.mastodon.social/media_attachments/files/116/614/316/616/956/569/original/933ccdc258f5af35.png
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Friday, 22-May-2026 05:43:28 JST Paul Cantrell
  in reply to
  - sean watters
  @swatters
  Heck yeah! I hope I got the directives right…. :/
  
  In conversation about 19 days ago permalink
- Embed this notice
  Oliver Jensen (ojensen@hachyderm.io)'s status on Monday, 25-May-2026 17:10:49 JST Oliver Jensen
  in reply to
  
  @inthehands i love this idea. I am particularly curious if removing something from their search index also removes the information from the purview of their ai responses. If you have a way to determine this experimentally, I'd love to know the answer.
  
  In conversation about 16 days ago permalink

Feeds