Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://mastodon.social/users/persagen/statuses/110826632783594455">Victoria Stuart 🇨🇦 🏳️‍⚧️ (persagen@mastodon.social)'s status on Wednesday, 13-Sep-2023 15:58:47 JST</a><a href="https://mastodon.social/@persagen" title="persagen@mastodon.social"><img src="https://gnusocial.jp/avatar/156189-48-20240908055012.webp" width="48" height="48" alt="Victoria Stuart 🇨🇦 🏳️‍⚧️" style="position: absolute; left: 0; top: 0;">Victoria Stuart 🇨🇦 🏳️‍⚧️</a><div><a href="https://mastodon.social/@persagen/110725040838836588" rel="in-reply-to">in reply to</a></div></section><article><p>Addendum 1</p><p>Theory for Emergence of Complex Skills in Language Models<br><a href="https://arxiv.org/abs/2307.15936" rel="nofollow noreferrer">https://arxiv.org/abs/2307.15936</a></p><p>* new skills emerge in language models when their parameter set, training corpora are scaled up<br>* poorly understood phenomenon; mathematical analysis of gradient-based training difficult<br>* paper analyzes emergence using scaling laws &amp; simple statistical framework<br>* mathematical analysis imply strong form of inductive bias that allows pre-trained model to learn very efficiently</p><p><a href="https://mastodon.social/tags/LLM" rel="tag">#LLM</a> <a href="https://mastodon.social/tags/emergence" rel="tag">#emergence</a></p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/2054100#notice-4031215">In conversation</a><time datetime="2023-09-13T15:58:47+09:00" title="Wednesday, 13-Sep-2023 15:58:47 JST">about a year ago</time> <span>from <span><a href="https://mastodon.social/@persagen/110826632783594455" rel="external" title="Sent from mastodon.social via ActivityPub">mastodon.social</a></span></span><a href="https://mastodon.social/@persagen/110826632783594455">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/1514588">Untitled attachment</a></label><br></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Victoria Stuart 🇨🇦 🏳️‍⚧️ (persagen@mastodon.social)'s status on Wednesday, 13-Sep-2023 15:58:47 JST Victoria Stuart 🇨🇦 🏳️‍⚧️
in reply to
Addendum 1
Theory for Emergence of Complex Skills in Language Models
https://arxiv.org/abs/2307.15936
* new skills emerge in language models when their parameter set, training corpora are scaled up
* poorly understood phenomenon; mathematical analysis of gradient-based training difficult
* paper analyzes emergence using scaling laws & simple statistical framework
* mathematical analysis imply strong form of inductive bias that allows pre-trained model to learn very efficiently
#LLM #emergence
In conversationabout a year ago from mastodon.socialpermalink
Attachments
1. Untitled attachment

Public

Embed Notice

HTML Code

Corresponding Notice