Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
Aether ??? (aether@poa.st)'s status on Thursday, 25-Jul-2024 21:25:56 JSTAether ??? AI needs high-quality human-generated data for training. That comes from the internet. But the internet is becoming increasingly overrun with AI-generated garbage. How screwed is future AI training? Absolutely fucked.
techcrunch.com/2024/07/24/model-collapse-scientists-warn-against-letting-ai-eat-its-own-tail/
>But the thing is, models gravitate toward the most common output. It won't give you a controversial snickerdoodle recipe but the most popular, ordinary one. And if you ask an image generator to make a picture of a dog, it won't give you a rare breed it only saw two pictures of in its training data; you'll probably get a golden retriever or a Lab.
>Now, combine these two things with the fact that the web is being overrun by AI-generated content and that new AI models are likely to be ingesting and training on that content. That means they're going to see a lot of goldens!
The paper in Nature is pretty technical.
nature.com/articles/s41586-024-07566-y
...and the supplementary content even more so, but in the Tech Crunch article there is an image that explains everything. In just four steps, the AI goes from a fairly representative idea of dogs to complete garbage.
static-content.springer.com/esm/art%3A10.1038%2Fs41586-024-07566-y/MediaObjects/41586_2024_7566_MOESM1_ESM.pdf