Embed Notice
HTML Code
Corresponding Notice
- Embed this notice@Moon @thendrix @waifu yeah what happens with that is your original input scrolls out of the context window.
you have to take active measures to deal with that. what measures those are is an open question.
mistral wants you to do some "sliding window" context which does something so every 4k tokens it crunches the context in to a new set of tokens that is supposed to somehow carry some context from before. so it still maintains the same size of context window, but it buffers itself with custom tokens to try and remember recent history.
ggml folk have some other experimental fuzzy/infinite context feature that was merged. i tried to find the PR but it took too long.
llamaindex et all instead do some fingerprinting of output context, store those sentences and such in to a vector database, and they do top-k retrieval so whenever you input new text they grab the most relevant sentences as 'hints' and stuff them in the prompts