I’m hoping that some Open Source project will start collecting and distributing _trusted data collections_ as raw material to train LLM’s on. Thats where the value is; not the models themselves. Which are mostly trying to un-shittify the shit they got fed.
To me, the biggest issue with the whole thing is _I do not want something trained on the entirety of crap out there_. We _all_ know that most “content” available is biased, incorrect, racist, ignorant, hot garbage.