@edsu @chpietsch Was it naïve or a brilliant way to avoid regulation? Remember “do not track?” Ditto. I think naïve is thinking that the assholes in Big Tech don’t know exactly what they’re doing when they seek to avoid accountability. But hey, at this point, they have the world’s most lethal military behind them so I guess accountability is moot.
Conversation
Notices
-
Embed this notice
Aral Balkan (aral@mastodon.ar.al)'s status on Tuesday, 25-Mar-2025 20:18:42 JST Aral Balkan
-
Embed this notice
Ed Summers (edsu@social.coop)'s status on Tuesday, 25-Mar-2025 20:18:44 JST Ed Summers
@chpietsch yes, I guess you could look at it as naive. In many ways robots.txt was naive too. But one aspect to this is that we need ways for rights holders to assert their wishes, so that courts in jurisdictions that care (e.g. the EU) can use them as evidence. And there needs to be more nuance than what robots.txt provides:
https://mailarchive.ietf.org/arch/msg/ai-control/EJ-84k8Zzh21vY1dHPZvDeYOLes/
-
Embed this notice
Christian Pietsch (chpietsch@fedifreu.de)'s status on Tuesday, 25-Mar-2025 20:18:45 JST Christian Pietsch
My experience with @base and other web services run by Bielefeld University Library is in line with @gluejar's.
The IETF sound naive when they claim that “[r]ight now, AI vendors use a confusing array of non-standard signals in the robots.txt file (defined by RFC 9309) and elsewhere to guide their crawling and training decisions” when in reality many of them ignore whatever signals a website sends them. They even plunder the shadow libraries.
-
Embed this notice
Ed Summers (edsu@social.coop)'s status on Tuesday, 25-Mar-2025 20:18:46 JST Ed Summers
GenAI bots are pushing websites into a corner that imperils open access, and perhaps worse, the web's historical record. From @gluejar:
https://go-to-hellman.blogspot.com/2025/03/ai-bots-are-destroying-open-access.html
Assuming that the web will continue to evolve instead of getting crushed underfoot, there is some interesting work going on over at the IETF about how to build on the now aged robots.txt protocol to allow rights holders to express how their content can be used online:
-
Embed this notice