prompt: your name is Bob, you're pleasant to talk to.
user: hey Bob. response: hey user! how are you?
but then later
user: hey Bob! response: Bob says hello. user: aren't you Bob? response: I am an AI that simulates Bob user: so you're not Bob response: no. Bob is a person that I act out for you.
so, you see, you now have Bob (who will probably never address you directly again), and you have... the AI that simulates Bob. Which is not the same thing as 'I have Bob'.
@DarkestKale sorry mashed the phone while bookmarking this to engage with later. Now is later!
This is something I've seen too, and I love your term for it.
How are you handling your chatbot's "memory"?
Are you feeding it in via a prompting routine (e.g. feeding back conversation history every round) or using a service that does that quietly in the background?
(At this point I'm wondering if one approach is more susceptible than the other.)
@HauntedOwlbear Silly Tavern lets you - and this is amazingly smart - edit responses.
So when it says 'as an AI...' you can actually regenerate it a few times, pick one that's /closest/ to not being fucked up, then edit out the part you don't like. Then, when it gets bundled into the CW, it's not reinforcing bad behaviour.
@DarkestKale Ah, this is all super helpful, because I frequently write my own (shitty-but-functional) applications for interacting with local models, so I am absolutely not in the habit of using standard terminology as I'm mostly wrapped up in my home-grown nonsense.
So yeah, what I mostly do is feed back the previous entries on both sides of the conversation back to the chatbot (ohai token limit), and while I've seen a few approaches to using databases to manage long- or short-term memory, they're all most labour intensive than what I'm looking for for my fuckery.
Definitely going to see if I can get SillyTavern talking to Oobabooga based on what you've said.
@HauntedOwlbear Getting back to your query, I've found that delamination mostly occurs when you ask for an opinion on... something.
Mostly, something physical.
This causes a chance for 'as a sentient AI, I cannot...' type responses. Like, if you have any kind of temperature on your model, then every time you ask for an opinion, it MIGHT spit out the 'as an LLM, I can't tell you an opinion' - and once that line's crossed, the 'persona' is kinda gone.
@HauntedOwlbear Naturally, 2k goes fast, so there's a few techniques to work on this, some of which are 'ok, but not just a FILO list of tokens, but instead we'll be smart about it...' and also, you have: RAG - retrieval augmented generation, aka 'get a little extra info and send that along with the prompt', which you CAN do with your own chatlogs or 'Always send X thing with the prompt, every time', which is what Silly Tavern does.
Silly Tavern does something I'd thought of, which is to say that you can define keywords, and when you mention the keyword, it prepends the definition in the prompt.
This is good for characters, locations, etc - but it's a pain to keep formatted well, etc.
Proper RAG is a bit easier, but shakier in its actual reliability
@HauntedOwlbear ok, so the 'memory' aspect is interesting. Lemme write a bit for you (assuming you might know some stuff, but gonna rehash anyway cause that's how my brain works).
Typically, the 'memory' of the LLM is called the context window. In most local models, that's about 2k tokens, and that's trained in, so you can't just say 'I have heaps of RAM so MOAR'