@feld@yogthos@oz yeah, filtering out bad data and creating good data is the current main problem in the field, many think that synthetic data generation is the way to go, and both llama3 and WizardLM use it. adding 'tarpits' to random indieweb blogs will not do much.
@yogthos so, if you don't like to see contents from LLMs, a solution is to publish lots of LLM-generated contents, in the hope that they'll eventually collapse.
It looks like folks are already making tarpit webpages specifically for (Open)AI bots, in an effort to feed garbage to the model. 🤔
@oz@yogthos the models are not going to work well unless the data its trained on is carefully curated. The LLMs that are trained on high quality data are really really good