@lxo @chaz It's a good approach, but I don't think it's needed.
If we start from first principles, there's no doubt that to fully exercise freedoms of study and modify, you need the training data. (You can exercise *some* of those freedoms even without training data, but in a suboptimal way. I can give precise examples if you're curious.)
The "data is too big" problem is IMO a distraction. There are relevant ML systems that are small enough to make retraining them from scratch viable.