"It’s impossible to sell other people’s cars without stealing them, therefore we must be allowed the business model of stealing cars"
https://www.theguardian.com/technology/2024/jan/08/ai-tools-chatgpt-copyrighted-material-openai
"It’s impossible to sell other people’s cars without stealing them, therefore we must be allowed the business model of stealing cars"
https://www.theguardian.com/technology/2024/jan/08/ai-tools-chatgpt-copyrighted-material-openai
@emovulcan you realize that schools have to literally buy the books in their library, right?
They don’t go to bookstores and steal them.
And parents/pupils have to buy (very expensive) textbooks for courses.
Why are you trying (badly) to be a white Knight for VC-funded AI companies who rip off authors? Who hurt you?
@thomasfuchs do schools pay copyright for the knowledge they transfer to their students?
Imagine this: parents drop their kids to a library telling them to "read everything". The kids grow up to highly paid professionals thanks to this....
Would every single author from the books in that library be entitled to compensation from the kids because their success is based on the knowledge from these books?
@thomasfuchs not a fan of OpenAI at all, but neither do I buy this classic copyright maximalist argument. A stolen car is unavailable to the owner; stolen texts or images used to train models are no less available, they're "non-rivalrous."
Many business models are impossible under copyright maximalism, including much of art, if you believe the argument Kirby Ferguson makes in "Everything is a Remix."
@ryanprior sure, if authors would get compensated with, say, a living wage guaranteed by the state from taxation.
We’re not living in some utopia though and what Open AI is doing is just plain stealing, they actually admit that that’s what they’re doing.
@thomasfuchs
You and @molly0xfff both nailed this...
https://hachyderm.io/@molly0xfff/111721704587608026
@kurtseifried Serious answer: don’t steal the content.
Just because that is inconvenient from LLM training doesn’t suddenly make it legal. ¯\_(ツ)_/¯
@thomasfuchs Serious question: how do we support smaller companies or individuals training LLM's (e.g. Open Source LLMs) if we have strict copyright enforcement and licensing? There are precious few up-to-date training data sets that are licensed under an Open Source license or Public Domain.
Enforcing copyright and licensing for training data will 100% make training larger or up-to-date LLMs impossible for anyone without lawyers and tens of millions of dollars. I don't think that's a good long-term outcome.
@kurtseifried the law says it’s stealing
@thomasfuchs First off.. who says this is "stealing"? Copyright fair use is a thing.
Also in Japan it's not stealing:
“regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise.”
https://www.biia.com/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/
@kkarhan @tante@tante.cc @tante@tldr.nettime.org LLMs aren’t people who learn and apply knowledge.
They’re literally throwing whole verbatim copyrighted works into a database and will reproduce them wholesale with minimum changes and without attribution.
@thomasfuchs @tante@tante.cc @tante@tldr.nettime.org *Arbiter Voice* "Were it so easy..."
By that logic every wageworker and capital investor would be lifelong debt peons to schoolbook and textbook authors because they learned and applied the contents of said copyrighted works!
But that's not how any of this works - and we can all be glad for it, because otherwise the #Copyrightmafia would extort everyone as their #racketeering would be the norm, not an opt-in!
https://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/
@celesteh yeah they’re bullshitting
@thomasfuchs I mean they're right that the copyright system is in a crisis of overreach, but recent works by living authors are not part of that problem.
@thomasfuchs Did they just openly state that their entire business model is illegal?
@jacobat @thomasfuchs These companies claim that their use of copyrighted material is just fine -- that it falls under the "fair use" exemption in US law.
This *may* be true (in terms of being found in court to qualify as such), but if so, the law needs a refresh. "Fair use" was historically a way to allow a person or organization to take a *small amount* of copyrighted material to comment on it, or cite it in research, or make a parody or other highly-legally-protected form of derivative work. It has not, up until now, been invoked on a scale of "essentially all copyrighted works available publicly on the Internet".
@dpnash @jacobat What they’re doing has as much to do with fair use as a McDonalds with fine dining
@delric “in our defense, we disassembled the stolen cars and are working hard to put them randomly back together before selling them.”
@thomasfuchs “I mean, it’s not like we’re going to *pay* for this!”
@cornelius I’m using neural networks for my astrophotography stuff (namely deconvolution and correction of optical aberrations), and it’s specialized software I paid for which was trained on paid-for datasets.
The only thing that’s somehow “impossible” is VC-backed startups properly paying for licensing because they feel entitled to just steal and scrape everything of the Internet.
@thomasfuchs yes it is simple theft for personal gain. There are plenty of us being chased for AI in fields which don’t have huge convenient free datasets who are wrestling with this, and finding ways to solve it without stealing IP.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.