LLMs have no way of distinguishing data from instructions.
Creators of these systems use all sorts of tricks to try and separate the prompts that define the “guardrails” from other input data, but fundamentally it’s all text, and there is only a single context window.
Defending from prompt injections is like defending from SQL injections, but there is no such thing as prepared statements, and instead of trying to escape specific characters you have to semantically filter natural language.
7/🧵