@eaton
I guess I was thinking that to "reach in" you could just process the inter layer values of a select few of the LLM layers by masking or compressing them down to a lower matrix size. Do that at a few points along the LLM, and use them as the input to a lower parameter network you training (after the LLM is trained) to predict the LLM time to respond.
Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
Daniel (ergo42@mastodon.gamedev.place)'s status on Monday, 12-Aug-2024 03:28:14 JSTDaniel