Addendum 1
Theory for Emergence of Complex Skills in Language Models
https://arxiv.org/abs/2307.15936
* new skills emerge in language models when their parameter set, training corpora are scaled up
* poorly understood phenomenon; mathematical analysis of gradient-based training difficult
* paper analyzes emergence using scaling laws & simple statistical framework
* mathematical analysis imply strong form of inductive bias that allows pre-trained model to learn very efficiently