Let me train my models locally on archived Wikipedia. Let me toss in my own trusted sources like MDN.
I think the real value source in all of this is _well curated datasets_. All the hype of the models themselves is largely going to die out. The initial strategy of feeding enormous amounts of any-quality-data has such a narrow viable use case IMO; which is “understanding what is being asked”; but not “answering the question being asked”.