Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://rukii.net/users/tero/statuses/112144660907911426">Tero Keski-Valkama (tero@rukii.net)'s status on Saturday, 23-Mar-2024 20:08:52 JST</a><a href="https://rukii.net/@tero" title="tero@rukii.net"><img src="https://gnusocial.jp/avatar/133855-48-20230603125628.webp" width="48" height="48" alt="Tero Keski-Valkama" style="position: absolute; left: 0; top: 0;">Tero Keski-Valkama</a><div><a href="https://blob.cat/objects/2990432b-ea12-4028-97a1-1bfa18b4d8e4" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><p><a href="https://blob.cat/users/icedquinn">@icedquinn</a>, yes and no. Our RL frameworks as popularized by Sutton &amp; Barto decades ago are not what biological life does. It's not even what LLMs do. LLMs and deep transformers in general are able to do in-context RL near optimally, and this is very far from our capabilities using classic RL.</p><p>Classic RL doesn't work for a couple of reasons:<br>- Sparse rewards are the only signal for guiding actions. This works for simple games, not for the real world. Animals aren't learning by sparse rewards, they have complex intents and complex notions of success and failure, not just pain and pleasure.<br>- Classic RL separates the first-person as the agent, which cannot learn from other agents. This framework is for simple games where there are no heterogeneous ocean of other agents. The real world has this ocean of agency though, and it can be exploited/mined for third-person experience. Monkey see, monkey do.<br>- Sequential, discrete actions are too simple a framework to actually control real world bodies in a non-trivial fashion.</p><p>I can design a better, modern RL framework which is suitable for the real world though.</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/2855964#notice-5672091">In conversation</a><time datetime="2024-03-23T20:08:52+09:00" title="Saturday, 23-Mar-2024 20:08:52 JST">about a year ago</time> <span>from <span><a href="https://rukii.net/@tero/112144660907911426" rel="external" title="Sent from rukii.net via ActivityPub">rukii.net</a></span></span><a href="https://rukii.net/@tero/112144660907911426">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
Tero Keski-Valkama (tero@rukii.net)'s status on Saturday, 23-Mar-2024 20:08:52 JSTTero Keski-Valkama
in reply to
- iced depresso
@icedquinn, yes and no. Our RL frameworks as popularized by Sutton & Barto decades ago are not what biological life does. It's not even what LLMs do. LLMs and deep transformers in general are able to do in-context RL near optimally, and this is very far from our capabilities using classic RL.
Classic RL doesn't work for a couple of reasons:
- Sparse rewards are the only signal for guiding actions. This works for simple games, not for the real world. Animals aren't learning by sparse rewards, they have complex intents and complex notions of success and failure, not just pain and pleasure.
- Classic RL separates the first-person as the agent, which cannot learn from other agents. This framework is for simple games where there are no heterogeneous ocean of other agents. The real world has this ocean of agency though, and it can be exploited/mined for third-person experience. Monkey see, monkey do.
- Sequential, discrete actions are too simple a framework to actually control real world bodies in a non-trivial fashion.
I can design a better, modern RL framework which is suitable for the real world though.
In conversationabout a year ago from rukii.netpermalink

Public

Embed Notice

HTML Code

Corresponding Notice