RE: https://mastodon.nz/@cuddlyanarchist/116478712970307954
The problems with imputing "meaning" from atoms of text are deeply ingrained and potentially unsolvable, at least with LLMs as currently implemented.
Predictive inference is just a statistical exercise. And if your data set is dirty, skews in a particular direction, has gaps in it, or represents a rapidly changing knowledge domain, the conclusions you make from that cannot possibly be better than the source.
All of this was a problem in the beginning of the genAI boom a few years ago, but now it's becoming far worse as models train on the slop that they themselves spewed out. We've created a giant petri dish and now the culture is feeding on its own waste products. All those gaps, biases, and shaky prior assumptions are merrily spawning their own feedback loops.