this is interesting, but i don't quite agree. i don't think this is model collapse, per se. i believe when you do "search" with an LLM, what you are actually doing is RAG: they are not constantly re-training their models on the online content they added to their index over the last 48hrs, they are querying their vectorized index of that content with your vectorized search terms, dumping that context into the LLM, and returning a long, chatty result. https://www.theregister.com/2025/05/27/opinion_column_ai_model_collapse/
what we're seeing is actually far worse. it's a general **epistemological collapse**. the open web is getting filled up with garbage LLM content to the point that it is becoming difficult to find useful results, regardless of whether it's an LLM search or a regular search.
the original task of internet search engines was to help users find a needle in a haystack. that was a solvable problem, and they were very good at it **because the haystack was finite** but now LLMs are generating an infinite haystack of slop and deliberate misinformation.
when biologists cloned sheep in the 1990s, it was a red alert for bio-ethics, and some pretty strict rules were set for what you can do with a human genome, because everyone knew that once some lab-created horror made it out into the population, there was no way to fix it.
i feel like we are at a similar moment with the ecosystem of human knowledge and no one is talking about it like the civilizational emergency it is.
if you have a library of 10,000 precious books containing thousands of years of human knowledge, and that library burns down, it's a tragedy. and it's the **exact same effect** if, instead of burning them, you mix those 10,000 books into a sea of 1,000,000,000 books that look exactly like them but contain fabricated content with no traceable record of who created them or why.
this is what's at risk with LLM-generated content, and we badly need some kind of guidelines and a project to archive original, human-produced knowledge before it's too late and it becomes impossible to extract it from an ocean of random language.
people try to draw comparisons to the printing press or whatever, but the **scale** of what computers are able to produce is insane. millions of GPUs, each one capable of producing millions of words of language per day given a steady supply of electricity. i'll be shocked if internet search is usable at all within 18 months.
transphobia endangers **all** women. and that is partially the point: the fascists want to go back to a world where women are required to wear skirts and grow their hair long, and where men are banned from wearing nail polish and skirts.
i'm interested in one-way veils coming back as a hot new fashion accessory. or preferably, i'm interested in the EU banning facial recognition technology in wearable cameras.
also, i love that Joe is absolutely putting the Meta spokesgoon on blast throughout the piece. fuck these assholes, they shouldn't be able to get away with lying without paying a price. people who work for Meta are psychopaths.
here's what Dave Arnold was doing before he became the Public Affairs Director for Meta. psychopath. guy has zero humanity, and he's been rewarded **handsomely** for it.
Admin of thepit.social. he/him.American, formerly of DC, immigrant in Barcelona, Spain. Father of @monkeboi. Former linguist, current student of SQL and Python, master of a locally-run LLM that has escaped onto the internet (@actuallybot).