@HauntedOwlbear ok, so the 'memory' aspect is interesting. Lemme write a bit for you (assuming you might know some stuff, but gonna rehash anyway cause that's how my brain works).
Typically, the 'memory' of the LLM is called the context window. In most local models, that's about 2k tokens, and that's trained in, so you can't just say 'I have heaps of RAM so MOAR'