The #copyright suit by the Authors Guild and co. against OpenAI is directly framing perhaps the most plausible cause of action against generative #ai: reproduction in the process of training (and maybe data set assembly, though the complaint could be more precise about this). There are some hard civ pro hurdles to overcome to get the proof of specific incidents of infringement that they're alleging, and a hard fair use fight on the back end. But this is one to watch. https://storage.courtlistener.com/recap/gov.uscourts.nysd.606655/gov.uscourts.nysd.606655.1.0_1.pdf
@blakereid Huh. From my technologist’s seat, this seems like a dangerous argument I’d want to see fail, while the output argument is one that plainly ought to succeed. But I realize that the law hinges on nobody’s sense of “ought to!”
This strategy avoids the trouble of drawing a connection from works in the training data and *outputs* of the models by focusing more granularly on the inherent copying entailed in training. If the plaintiffs can get over the procedural hurdles of establishing specific infringement (and class cert, etc.), this strategy also has the advantage of shifting the burden of dealing with the outputs into the case for fair use, which is way more atomistic and complex post-Warhol.
@blakereid Hmm, that’s a good point: purpose matters. Google copying content during indexing for the purpose of directing people •to• it is a fundamentally different ethical question from copying during training for the purpose of •displacing• it. Is the legal concept of harm in copyright sharp enough / strong enough for that distinction to matter?
@inthehands I'm not sure where I come out on the right answer. But I can at least see the argument: there's unauthorized copying of vast swaths of works into a machine that is designed, at least in part, to use the content of those works to economically displace their authors.
My naïve, foolish, ignorant, IANAL reading of copyright law is that it is economic impact, not design intent, that determines harm. But…maybe the two lawyers here can set me straight on that!
@blakereid@inthehands I wouldn't say LLMs are 'designed' to economically displace book authors. They're designed to compose a wide range of coherent, fluent texts, in response to free-form user prompts, for a very wide range of purposes to which a very wide range of users might put them--from the mundane to the sublime. (I agree that framing will be key here, and that a lot of the framing will be gaming.)
@inthehands@blakereid Intent is relevant, potentially, to the first fair use factor: the purpose and character of the use. That factor has tended to decompose into two subfactors: commercial or noncommercial; transformative or nontransformative. But that rubric is a little up in the air after AWF v. Goldsmith. Some (but few) courts consider whether the use is good faith or bad faith, but that's appropriately uncommon imo.
@AnnemarieBridy@inthehands some of them even say the quiet part out loud, like "Journalists and newsrooms use Lede Al for high school and college sports reporting, automating the process so that they don't have to dedicate resources to local sports coverage."
@AnnemarieBridy@inthehands a great deal of "local news organizations" have private equity owners whose explicit goal is to reduce costs (i.e., headcount) and are very likely treat the availability of AI tools to generate content as cover to displace journalists.
@blakereid@inthehands If the stated purpose of the partnership is “to develop tools that could assist local news organizations,” I’m not sure where you’re seeing implied displacement. There is certainly a fear of disinformation, but that doesn’t strike me as a substitution issue that copyright is meant to address, and I’m not sure how saying that training LLMs isn’t fair use would address the problem of disinformation (which is being generated and distributed quite widely right now by humans).
@AnnemarieBridy@inthehands depends on which platform we’re talking about, but there are some overt declarations of intended uses that at least imply displacement (e.g., https://openai.com/blog/partnership-with-american-journalism-project-to-support-local-news). I also think displacement is pretty strongly implied by the transformativity argument. But then again, there are hard first order questions about how to divine purpose and how to sort between multiple purposes (complicated further by the mess of Warhol). Very hard to make confident predictions here.