@glynmoody It's copyright violation on a massive scale, is what it is, and the anti-piracy group is in the right in this case. It's possible to train LLMs legally: but the techbro culture of "move fast and break things" includes "laws" in things they think it's okay to break.
@cstross I'm afraid I don't agree. extracting patterns from any copyright material isn't an infringement, and claiming it is, is a constraint on sharing knowledge, and just sets up publishers as gatekeepers
@glynmoody@cstross Given that the AIs have repeatedly been made to cough up source material, I'd say they aren't extracting patterns, but rather incorporating the original material whole hog.
@cstross *if* the entire book is regurgitated, then you can plausibly claim infringement for that particular output; but otherwise it's just mathematical weights based on millions of books - that's different.
@glynmoody Nope: in UK copyright law "fair dealing" covers SHORT extracts of up to about 20 words. Emit more than that? The LLM is in breach of the law.
@dmarti thus guaranteeing that EU AI is not as good as US/Chinese AI and that EU viewpoints are underrepresented in the weightings... @fabiocosta0305@cstross
@fabiocosta0305@glynmoody@cstross The fair use issue in the USA still isn't decided. In the EU, though, AI can be trained on copyrighted works unless the copyright holder "expressly reserves" -- which all the mainstream publishing companies, and most independent authors, have been doing
@glynmoody@cstross AFAIK, training AI doesn't fall on the criteria for Fair Use, UNLESS you can build an AI to create parodies or critics on those stuff
@glynmoody@cstross mind you, today's realities probably include the law bending over backwards to accommodate billionaires. I wonder if this Dutch LLM is owned by a billi... "The LLM developers lack the funds to litigate the matter" so that's that then
@cstross@glynmoody and we all know UK fair dealing fairly broken too. There's a line somewhere between "I stole your book" and "I learned to write reading your book". It's incredibly murky even in law for humans and is going to need some case-law and other stuff to untangle at all let alone sanely. Other bits are really bad LLM design that need fixing. An LLM shouldn't regurgitate half a book to answer a question it should tell you what book to read because it knows it's using a lot of material
@glynmoody@rayres@cstross You could fairly accuse me of hypocrisy but I see a human making fair use of an artist's content to produce new art quite differently from an LLM using it to produce garbage. If I were an artist and my art inspired new art, I would be honoured. The garbage an LLM vomits out... not so much.
I don't think LLMs should be able to claim fair use. What they do is not what the point of fair use is all about.