@icedquinn, yes and no. Our RL frameworks as popularized by Sutton & Barto decades ago are not what biological life does. It's not even what LLMs do. LLMs and deep transformers in general are able to do in-context RL near optimally, and this is very far from our capabilities using classic RL.
Classic RL doesn't work for a couple of reasons:
- Sparse rewards are the only signal for guiding actions. This works for simple games, not for the real world. Animals aren't learning by sparse rewards, they have complex intents and complex notions of success and failure, not just pain and pleasure.
- Classic RL separates the first-person as the agent, which cannot learn from other agents. This framework is for simple games where there are no heterogeneous ocean of other agents. The real world has this ocean of agency though, and it can be exploited/mined for third-person experience. Monkey see, monkey do.
- Sequential, discrete actions are too simple a framework to actually control real world bodies in a non-trivial fashion.
I can design a better, modern RL framework which is suitable for the real world though.