I agree with a lot of people that opt-out content mode of the Web is pretty amazing. The fact that you can publish content that is available to all, including bots, but that you can restrict those bots with an easy-to-edit robots.txt file, is actually remarkably cool. If you've ever used it, you'll know how effective it can be.
Conversation
Notices
-
Embed this notice
Evan Prodromou (evan@cosocial.ca)'s status on Thursday, 14-Mar-2024 10:17:14 JST Evan Prodromou -
Embed this notice
Evan Prodromou (evan@cosocial.ca)'s status on Thursday, 14-Mar-2024 10:19:01 JST Evan Prodromou The per-page and per-path restrictions are also really cool. Being able to make those choices is important.
-
Embed this notice
Evan Prodromou (evan@cosocial.ca)'s status on Thursday, 14-Mar-2024 10:24:15 JST Evan Prodromou @blake no. Why do you think that?
-
Embed this notice
Blake Leonard (blake@infosec.town)'s status on Thursday, 14-Mar-2024 10:24:17 JST Blake Leonard @evan This has got to be satire. Right?
-
Embed this notice
Evan Prodromou (evan@cosocial.ca)'s status on Thursday, 14-Mar-2024 10:24:48 JST Evan Prodromou One thing that's not possible with robots.txt is filtering by robot *type* rather than robot *identity* (via User-Agent). So, I can block `Googlebot-Image` to keep that one bot from spidering my site, but I can't specify a class of entities like "all image search indexers" or "all search indexers". I don't know if this was ever considered, but I think it'd be interesting to know why it doesn't exist.
-
Embed this notice
Evan Prodromou (evan@cosocial.ca)'s status on Thursday, 14-Mar-2024 10:41:52 JST Evan Prodromou @blake I usually put sarcastic/satirical stuff in quotation marks.
-
Embed this notice