Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://mstdn.social/users/rysiek/statuses/115859595821181637">Michał "rysiek" Woźniak · 🇺🇦 (rysiek@mstdn.social)'s status on Thursday, 08-Jan-2026 22:57:36 JST</a><a href="https://mstdn.social/@rysiek" title="rysiek@mstdn.social"><img src="https://gnusocial.jp/avatar/13472-48-20221025105231.webp" width="48" height="48" alt='Michał "rysiek" Woźniak · 🇺🇦' style="position: absolute; left: 0; top: 0;">Michał "rysiek" Woźniak · 🇺🇦</a><div><a href="https://mstdn.social/@rysiek/115859592403908080" rel="in-reply-to">in reply to</a></div></section><article><p>LLMs have no way of distinguishing data from instructions.</p><p>Creators of these systems use all sorts of tricks to try and separate the prompts that define the “guardrails” from other input data, but fundamentally it’s all text, and there is only a single context window.</p><p>Defending from prompt injections is like defending from SQL injections, but there is no such thing as prepared statements, and instead of trying to escape specific characters you have to semantically filter natural language.</p><p>7/🧵</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/6037269#notice-11886849">In conversation</a><time datetime="2026-01-08T22:57:36+09:00" title="Thursday, 08-Jan-2026 22:57:36 JST">about 3 months ago</time> <span>from <span><a href="https://gnusocial.jp/notice/11886849" rel="external" title="Sent from gnusocial.jp via ActivityPub">gnusocial.jp</a></span></span><a href="https://gnusocial.jp/notice/11886849">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
Michał "rysiek" Woźniak · 🇺🇦 (rysiek@mstdn.social)'s status on Thursday, 08-Jan-2026 22:57:36 JST Michał "rysiek" Woźniak · 🇺🇦
in reply to
LLMs have no way of distinguishing data from instructions.
Creators of these systems use all sorts of tricks to try and separate the prompts that define the “guardrails” from other input data, but fundamentally it’s all text, and there is only a single context window.
Defending from prompt injections is like defending from SQL injections, but there is no such thing as prepared statements, and instead of trying to escape specific characters you have to semantically filter natural language.
7/🧵
In conversationabout 3 months ago from gnusocial.jppermalink

Public

Embed Notice

HTML Code

Corresponding Notice